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An  example  ol  randomization  is  given  by  the  strategy  of  shaking  a  bin  containing 
a  part  in  order  to  orient  the  part  in  a  desired  stable  state  with  some  high  probability. 
Another  example  consists  of  first  using  reliable  sensory  information  to  bring  two  parts 
close  together,  then  relying  on  short  random  motions  to  actually  mate  the  two  parts, 
once  the  part  motions  lie  below  the  available  sensing  resolution.  Further  examples 
include  tapping  parts  that  are  tightly  wedged,  twirling  gears  before  trying  to  mesh 
them,  and  vibrating  parts  to  facilitate  a  mating  operation. 

Randomization  is  seen  as  a  primitive  strategy  that  arises  naturally  in  the  solution 
of  manipulation  tasks.  Randomization  is  as  essential  to  the  solution  of  tasks  as  are 
sensing  and  mechanics.  An  understanding  of  the  way  that  randomization  can  facilitate 
task  solutions  is  integral  to  the  development  of  a  theory  of  manipulation.  Such  a 
theory  should  try  to  explain  the  relationship  between  solvable  tasks  and  repertoires 
of  actions,  with  the  aim  of  creating  autonomous  systems  capable  of  existing  in  an 
uncertain  world. 

The  thesis  expands  the  existing  framework  for  generating  guaranteed  strategies 
to  include  randomization  as  an  additional  operator.  A  special  class  of  randomized 
strategies  is  considered  in  detail,  namely  the  class  of  simple  feedback  loops.  A  simple 
feedback  loop  repeatedly  considers  only  current  seiued  values  in  deciding  on  actiods 
to  execute  in  order  to  make  progress  towards  task  completion.  When  progress  is  not 
possible  the  feedback  loop  executes  a  randomizing  motion.  The  thesis  shows  that  if 
the  average  velocity  of  the  system  points  towards  the  goal,  then  the  system  converges 
to  the  goal  rapidly. 

A  simple  feedback  loop  was  implemented  on  a  robot.  The  task  consisted  of 
inserting  a  peg  into  a  hole  using  only  position  sensing  and  randomization.  The 
implementation  demonstrated  the  usefulness  of  randomization  in  solving  a  task  for 
which  sensory  information  was  poor. 
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Abstract 

Robots  must  act  purposefully  and  successfully  in  an  uncertain  v/orld.  Sensory 
information  is  inaccurate  or  noisy,  actions  may  have  a  range  of  effects,  and  the  robot's 
environment  is  only  partially  and  imprecisely  modelled.  This  thesis  introduces  active 
randomization  by  a  robot,  both  in  selecting  actions  to  execute  and  in  focusing  on 
sensory  information  to  interpret,  as  a  basic  too]  for  overcoming  uncertainty. 

An  example  of  randomization  is  given  by  the  strategy  of  shaking  a  bin  containing 
a  part  m  order  ^o  orient  the  part  in  a  desired  stable  state  with  some  high  probability. 
Another  example  consists  of  first  using  reliable  sensory  information  to  bring  two  parts 
close  together,  then  relying  or,  short  random  motions  to  actually  mate  the  two  parts, 
once  the  part  motions  lie  below  the  available  sensing  esolution.  Further  examples 
include  tapping  parts  that  are  tightly  wedged,  twirling  gears  before  trying  to  mesh 
them,  and  vibrating  parts  to  facilitate  a  mating  operation. 

Randomization  is  seen  as  a  primitive  strategy  that  arises  naturally  in  the  solution 
of  manipulation  tasks.  Randomization  is  as  essential  to  the  solution  of  tasks  as  are 
sensing  and  mechanics.  An  understanding  of  the  way  that  randomization  can  facilitate 
task  solutions  is  integral  to  the  development  of  a  theory  of  manipulation.  Such  a 
theory  should  try  to  explain  the  relationship  between  solvable  tasks  and  repertoires 
of  actions,  with  the  aim  of  creating  autonomous  systems  capable  of  existing  in  an 
uncertain  world. 

The  thesis  expands  the  existing  framework  for  generating  guaranteed  strategies 
to  include  randomization  as  an  additional  operator.  A  special  class  of  randomized 
strategies  is  considered  in  detail,  namely  the  class  of  simple  feedback  loops.  A  simple 
feedback  loop  repeatedly  considers  only  current  sensed  values  in  deciding  on  actions 
to  execute  in  order  to  make  progress  towards  task  completion.  When  progress  is  not 
possible  the  feedback  loop  executes  a  randomizing  motion.  The  thesis  shows  that  if 
the  average  velocity  of  the  system  points  towards  the  goal,  then  the  system  converges 
to  the  goal  rapidly. 

A  simple  feedback  loop  was  implemented  on  a  robot.  The  task  consisted  of 
inserting  a  peg  into  a  hole  using  only  position  sensing  and  randomization.  The 
implementation  demonstrated  the  usefulness  of  randomization  in  solving  a  task  for 
which  sensory  information  was  poor. 
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Detailed  Abstract 


Robots  must  act  purposefully  and  successfully  in  an  uncertain  world.  Sensory 
information  is  inaccurate  or  noisy,  actions  may  have  a  range  of  effects,  and  the  robot's 
environment  is  only  partially  and  imprecisely  modelled.  This  thesis  introduces  active 
randomization  by  a  robot,  both  in  selecting  actions  to  execute  and  in  focusing  on 
sensory  information  to  interpret,  as  a  basic  tool  for  overcoming  uncertainty. 

An  example  of  randomization  is  given  by  the  strategy  of  shaking  a  bin  containing 
a  part  in  order  to  orient  the  part  in  a  desired  stable  state  with  some  high  probability. 
Another  example  consists  of  first  using  reliable  sensory  information  to  bring  two  parts 
close  together,  then  relying  on  short  random  motions  to  actually  mate  the  two  parts, 
once  the  part  motions  lie  below  the  available  sensing  resolution.  Further  examples 
include  tapping  parts  that  are  tightly  wedged,  twirling  geais  before  trying  to  mesh 
them,  and  vibrating  parts  to  facilitate  a  mating  operation.  Randomization  is  also 
useful  for  mobile  robot  navigation  and  as  a  means  of  guiding  the  design  process. 

Over  the  past  several  years  a  planning  methodology  [LMT]  has  evolved  for 
synthesizing  strategies  that  are  guaranteed  to  solve  robot  tasks  in  the  presence  of 
uncertainty.  Traditionally  such  strategies  make  judicious  usj  of  sensing  and  task 
mechanics,  in  conjunct  with  the  maintenance  of  past  sensory  information  and  the 
prediction  of  future  behavior,  in  order  to  overcome  uncertainty.  There  are  two 
restrictions  on  the  generality  of  this  approach.  First,  not  all  tasks  admit  to  guaranteed 
solutions.  I  ncertainty  simply  may  be  too  great  to  guarantee  task  success  in  a 
specific  number  of  steps.  Second,  a  strategy  is  only  as  good  as  is  the  validity  of 
its  assumptions.  In  an  uncertain  world  all  assumptions  are  subject  ‘o  uncertainty. 
For  instance,  there  may  be  unmodelled  parameters  that  govern  the  behavior  of  a 
system.  This  fundamental  uncertainty  limits  the  guarantees  that  one  can  expect 
from  any  strategy. 

The  randomization  approach  proposed  in  this  thesis  attempts  to  bridge  these 
difficulties.  First,  the  underlying  philosophy  of  a  randomized  strategy  assumes  that 
several  attempts  may  need  to  be  maue  at  solving  a  task.  A  task  is  only  assumed  to 
be  solvable  with  some  probability  on  any  given  attempt.  This  view  of  a  solution  to  a 
task  broadens  the  class  of  solvable  tasks.  Second,  by  actively  randomizing  its  actions 
a  system  can  blur  the  significance  of  unmodelled  or  uncertain  parameters.  Effectively 
the  system  is  perturbing  its  task  solutions  slightly  through  randomization.  The  intent 
is  to  obtain  probabilistically  a  solution  that  is  applicable  for  particular  instantiations 
of  these  unknown  parameters. 

An  understanding  of  the  way  that  randomization  can  facilitate  task  solutions 
is  integral  to  the  development  of  a  theory  of  manipulation.  Such  a  theory  should 
try  to  explain  the  relationship  between  solvable  tasks  and  repertoires  of  actions, 
with  the  aim  of  creating  autonomous  systems  capable  of  existing  in  an  uncertain 
world.  Randomization  is  seen  as  a  primitive  strategy  that  arises  naturally  in  the 
sol  ition  of  manipulation  tasks.  Randomization  is  as  essential  to  the  solution  of 
tasks  as  are  sensing  ar  I  mechanics.  By  formally  introducing  randomization  into  the 


theory  of  manipulation,  the  thesis  provides  one  further  step  towards  understanding 
t  he  relationship  of  tasks  and  strategies. 

The  thesis  expands  the  existing  framework  for  generating  guaranteed  strategies 
to  include  randomization  as  an  additional  operator.  A  special  class  of  randomized 
strategies  is  considered  in  detail,  namely  the  class  of  simple  feedback  loops.  A  simple 
feedback  loop  repeatedly  considers  only  current  sensed  values  in  deciding  on  actions  to 
execute  in  order  to  make  progress  towards  task  completion.  Integral  to  the  definition 
of  a  simple  feedback  loop  in  this  thesis  is  the  notion  of  a  progress  measure.  Distance 
to  the  goal  can  serve  as  a  progress  measure  as  can  some  nominal  plans  developed 
under  the  assumption  of  no  uncertainty.  When  progress  is  not  possible  the  feedback 
loop  executes  a  randomizing  motion.  The  thesis  shows  that  if  the  average  velocity 
of  the  system  relative  to  the  progress  measure  points  towards  the  goal,  then  the 
system  converges  to  the  goal  rapidly.  In  particular,  the  expected  time  to  attain  the 
goal  is  bounded  by  the  maximum  progress  label  divided  by  the  minimum  expected 
velocity.  A  simple  feedback  loop  in  the  plane  is  analyzed.  It  is  shown  that  the  rapid 
convergence  regions  of  this  randomized  strategy  are  considerably  better  than  those 
for  a  corresponding  guaranteed  strategy. 

As  part  of  the  thesis,  a  simple  feedback  loop  was  implemented  on  a  robot.  The  task 
consisted  of  inserting  a  peg  into  a  hole  using  only  position  sensing  and  randomization. 
The  miplementation  demonstrated  the  usefulness  of  randomization  in  solving  a  task 
for  which  sensory  information  was  poor. 

The  development  of  randomized  strategies  is  undertaken  in  the  discrete  and 
continuous  domains.  Most  of  the  technical  results  are  proved  in  the  discrete  domain, 
with  extensions  to  the  continuous  domain  indicated. 
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Chapter  1 
Introduction 


The  goal  of  robotics  is  to  understand  physical  interaction,  and  to  use  that 
understanding  towards  endowing  machines  with  the  autonomous  capability  of 
operating  productively  in  the  world.  Towards  realizing  this  goal,  a  large  body  of  work 
has  been  concerned  with  the  problem  of  providing  robots  the  ability  to  automatically 
synthesize  solutions  to  tasks  specified  in  high-level  terms.  Of  central  importance  in 
synthesizing  these  solutions  is  the  repertoire  of  primitive  actions  that  are  available  to 
a  robot.  It  is  evident  that  the  form  or  even  existence  of  a  solution  depends  on  the 
actions  available.  In  turn,  the  actions  that  one  is  likely  to  consider  depend  strongly 
on  one's  view  of  the  world.  In  recent  years,  the  key  obstacle  to  successfully  planning 
and  executing  task  solutions  has  been  uncertainty.  Uncertainty  arises  in  a  variety 
of  forms.  Often  uncertainty  arises  from  run-time  errors  in  sensing  or  control.  Other 
causes  of  uncertainty  may  be  one's  lack  of  knowledge  in  modelling  a  system  or  an 
environment.  The  realization  that  uncertainty  plays  a  fundamental  role  in  physical 
interaction  has  changed  the  character  of  primitive  actions  deemed  necessary  to  solve 
particular  robot  tasks.  For  instance,  in  a  perfect  world  it  may  be  enough  to  specify 
actions  of  the  form  Move  FROM  A  TO  B,  assuming  that  the  path  from  A  to  B  is  free. 
In  a  world  with  uncertainty  it  may  be  impossible  to  guarantee  the  success  of  such 
an  action.  The  work  on  uncertainty  over  the  past  two  decades  may  be  interpreted 
as  searching  for  various  primitive  actions  and  methods  of  action  combination  that 
extend  the  class  of  tasks  solvable  in  the  presence  of  uncertainty. 

The  archetypical  primitive  action  is  often  simply  a  motion  in  a  particular 
direction.  Sensors  determine  when  an  action  should  be  initiated  and  when  it  should 
be  terminated.  Actions  are  combined  by  a  planning  or  execution  system  whose 
responsibility  it  is  to  ensure  that  a  task  is  completed.  The  outcome  of  a  given 
action  may  be  non-deterministic,  as  uncertainty  may  yield  a  possible  range  of  results 
rather  than  a  unique  result  at  the  termination  of  an  action.  Actions  may  have  non- 
deterministic  outcomes,  but  generally  the  action  to  be  performed  at  a  given  stage  in 
the  solution  of  the  task  is  deterministically  fixed  as  a  function  of  sensor  values. 

Other  types  of  primitive  actions  are  imaginable.  For  instance,  instead  of  choosing 
actions  deterministically  as  a  function  of  sensory  inputs,  a  system  could  select  a 
motion  randomly  from  a  set  of  possible  motions.  Equivalently,  a  system  might 
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randomly  hallucinate  sensor  values  when  actual  sensor  values  are  not  sufficiently 
precise  to  guide  the  progress  of  a  task  solution.  More  simply,  a  given  action  may 
attain  a  particular  goal  only  with  some  non-zero  probability  of  success  but  not  with 
certainty.  Nonetheless,  if  the  action  is  repeatable  then  it  makes  sense  to  retain  the 
action  in  one’s  repertoire.  This  is  because  one  can  under  suitable  conditions  ensure 
eventual  success  by  placing  a  loop  around  the  action.  These  suitable  conditions 
postulate  the  absence  of  trap  states  and  lower  bounds  on  the  probability  of  success. 

We  will  refer  to  actions  in  which  random  choices  are  made  or  in  which  the  outcome 
is  probabilistically  determined  as  randomized  or  probabilistic  actions,  respectively. 
The  purpose  of  this  thesis  is  to  investigate  the  use  of  randomization  in  the  solution 
of  robot  tasks.  Randomized  and  probabilistic  actions  are  viewed  as  additional  types 
of  primitive  actions  whose  existence  is  essential  to  the  solution  of  many  tasks. 

The  advantages  to  be  gained  from  randomization  are  three-fold.  First, 
randomization  increases  the  class  of  solvable  tasks  beyond  those  solvable  by  bounded- 
step  guaranteed  strategies.  This  is  because  a  randomized  strategy  need  not  solve 
a  task  in  a  specific  number  of  steps,  but  must  merely  ensure  convergence  in  an 
expected  sense.  Second,  by  tolerating  local  failures  and  circumventing  these  with 
randomization,  a  strategy  becomes  less  sensitive  to  task  details.  This  reduces 
brittleness,  and,  third,  it  simplifies  the  planning  process. 


1.1  A  Peg-In- Hole  Problem 

Consider  the  task  of  placing  a  rectangular  peg  into  a  rectangular  hole.  See  figure 
1.1.  One  of  the  experiments  conducted  for  this  thesis  inserted  such  a  peg  using  a 
strategy  that  combined  sensing  and  randomization.  The  task  system  consisted  of  a 
PUMA  robot  that  manipulated  the  peg,  and  a  camera  system  that  provided  position 
sensing. 
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Figure  1.2:  Rough  sketch  of  the  run-time  character  of  a  strategy  that 
combination  of  sensing  and  randomization  to  attain  the  goal. 
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Combining  Sensing  and  Randomization 

The  nature  of  the  strategy  is  roughly  sketched  in  figure  1.2.  The  basic  principle  of  the 
strategy  is  to  make  use  of  sensory  information  when  possible,  and  otherwise  to  execute 
a  randomizing  motion.  The  purpose  of  the  randomizing  motion  is  to  either  attain  the 
goal  or  move  to  a  location  from  which  the  sensor  again  provides  useful  information. 
The  sensing  errors  are  represented  in  the  figure  with  an  error  bail.  For  configurations 
of  the  system  far  away  from  the  goal  the  resulting  sensing  information  may  adequately 
suggest  an  approach  direction  that  is  guaranteed  to  reduce  the  system’s  distance  from 
the  goal.  In  the  figure  this  is  indicated  by  a  pair  of  long  straight-line  motions,  one 
of  which  actually  attains  the  goal.  However,  when  the  system  is  near  the  goal,  the 
sensors  may  not  be  able  to  distinguish  on  which  side  of  the  goal  the  system  is.  In 
this  case,  the  system  will  execute  a  randomizing  motion.  A  possible  execution  trace 
of  such  motions  is  shown  in  the  figure. 

A  Three-Degree-of-Freedom  Strategy 

Leo  us  t AauiiixC  xhio  strategy  in  more  detail  for  the  peg-in-hole  problem. 

The  problem  was  restricted  to  a  three-dimensional  task,  instead  of  the  full 
six-dimensional  problem  inherent  to  an  object  with  three  translational  and  three 
rotational  degrees  of  freedom.  It  was  assumed  that  the  peg  was  properly  aligned 
vertically.  This  was  achieved  by  picking  up  the  peg  from  a  horizontal  table.  However, 
the  peg  was  permitted  to  be  misaligned  about  the  vertical  axis.  The  translational 
degree  of  freedom  corresponding  to  the  peg’s  height  above  the  hole  was  removed  by 
making  contact  between  the  peg  and  the  horizontal  plate  surrounding  the  hole.  Thus 
the  peg's  remaining  three  degrees  of  freedom  consisted  of  two  translational  degrees 
of  freedom  in  the  plane  perpendicular  to  the  vertical  axis,  and  a  rotational  degree 
of  freedom  about  this  axis.  The  axis  of  the  hole  was  assumed  to  be  parallel  to  the 
vertical  axis. 

The  system  operated  as  follows.  The  camera  was  mounted  above  the  assembly, 
looking  straight  down.  The  system  would  take  a  picture,  extract  edges,  then  try  to 
match  these  to  the  edges  of  the  hole  and  the  edges  of  the  peg.  Figure  1.3  depicts  an 
idealized  picture.  The  hole  was  backlit  from  below  by  a  light,  so  that  the  edges  visible 
to  the  camera  were  primarily  those  bounding  the  open  part  of  the  hole.  Having  fixed 
on  a  match  of  image  edges  to  the  peg  and  the  hole,  the  system  would  generate  a 
planar  motion  consisting  of  a  translation  and  a  rotation  that  would  roughly  align  the 
peg  above  the  hole.  Figures  1.4  through  1.6  portray  some  actual  data  obtained  by 
the  camera,  along  with  the  motion  suggested  by  the  system.  The  system  would  then 
try  to  execute  this  motion,  and  take  another  picture.  If  the  picture  indicated  that 
the  peg  was  probably  above  the  hole  and  properly  aligned,  the  system  would  try  to 
insert  the  peg.  The  test  for  proper  alignment  was  visibility  of  a  pair  of  perpendicular 
edges  on  both  the  peg  and  the  hole  that  were  in  close  proximity  and  parallel.  If  the 
peg  was  not  yet  ready  to  be  inserted  into  the  hole,  then  the  system  would  generate 
a  new  motion,  and  proceed  to  try  again.  If  ever  the  system  did  not  obtain  useful 
image  edges  for  suggesting  a  motion,  then  it  would  execute  a  randomizing  motion. 
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Figure  1.3:  Top  view  of  a  peg-in-hole  assembly.  The  camera  extracts  edges  from  the 
scene.  The  edges  are  used  to  suggest  a  motion  that  will  align  the  peg  over  the  hole. 


20 


CHAPTER  1.  INTRODUCTION 


Figure  1.4:  This  and  the  next  two  figures  show  some  actual  image  data  obtained  for 
the  peg-in- hole  strategy  outlined  in  figure  1.3.  The  lines  in  this  figure  were  obtaine 
from  an  image  taken  by  a  camera  looking  down  on  the  peg-in-hole  assembly.  The 
region  bounded  by  the  edges  is  the  portion  of  the  hole  visible  to  the  camera.  The 
hole  was  illuminated  from  below.  The  lines  were  thus  obtained  by  first  thresholding 
the  actual  image,  then  looking  for  zero-crossings. 


IN-HOLE  PROBLEM 
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his  figure  shows  the  system’s  attempt  to  match  the  short  image  edges 
o  the  physical  edges  of  the  peg  and  the  hole.  The  four  vertices  indicate 
interpiet«.ti^n  of  the  endpoints  of  the  physical  edges. 
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Figure  1.6:  The  outer  two  solid  lines  are  the  system's  interpretation  of  the  location 
of  the  hole  boundary  The  inner  two  solid  lines  are  the  system's  interpretation  of  the 
boundary  of  the  peg.  The  two  dashed  lines  indicate  the  system's  suggested  motion. 
Specifically,  if  the  peg  moved  precisely  as  suggested  by  the  system,  it  would  move  to 
the  location  indicated  by  these  lines. 
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The  motion  was  selected  in  a  random  fashion  from  a  collection  of  two-dimensional 
translations  and  rotations.  In  pseudo-code,  the  strategy  was  of  the  following  form. 

REPEAT  until  the  peg  is  in  the  hole: 

1.  Take  a  picture  of  the  assembly  from  above. 

2.  Extract  zero-crossing  edges  from  the  image. 

3.  Try  to  match  the  image  edges  to  the  peg  and  the  hole. 

4.  IF  the  edges  can  be  matched  reliably. 

THEN  use  these  to  move  the  peg  towards  the  hole, 

ELSE  execute  a  random  motion. 

End_repeat _ 


Pseudo-code  describing  a  randomized  strategy  for  inserting  a  peg  into  a  hole. 

The  i-y  dimensions  of  the  hole  were  31.75mm  x  19mm.  while  those  of  the  peg 
were  31mm  x  18mrn.  The  material  was  aluminum.  The  random  motions  within  the 
feedback  loop  had  maximum  magnitude  of  about  2.5mm.  The  insertion  was  started 
from  various  randomly  chosen  configurations  within  a  radius  of  about  10mm  of  the 
center  of  the  hole.  This  distance  is  well  within  the  accuracy  achievable  using  an  open- 
loop  motion  of  the  PUMA.  Indeed,  the  robot  arm  would  pick  up  the  peg  several  feet 
away  from  the  assembly,  then  move  it  to  within  camera  range  of  the  assembly  using 
a  preprogrammed  motion.  Once  within  camera  range,  the  feedback  strategy  outlined 
above  would  take  control  of  the  assembly. 

Errors  in  Sensing  and  Control 

The  interesting  aspect  of  the  non-randomizing  portion  of  this  strategy  is  that  it  does 
not  always  succeed.  There  are  two  reason  for  this.  First,  the  suggested  motion 
need  not  be  accurate,  and  second,  the  camera  may  not  return  any  useful  sensing 
information,  in  which  case  there  is  not  even  a  suggested  motion.  The  interesting 
failuie  is  the  second  one,  and  it  is  here  that  randomization  plays  a  useful  role.  We 
will  return  to  this  topic  shortly. 

The  first  type  of  failure  arises  both  because  of  calibration  errors  and  sensing 
uncertainty.  Consider  what  it  takes  to  transform  an  image  motion  into  a  robot 
motion.  There  must  be  some  correspondence  between  the  coordinate  system  of  the 
image  plane  and  the  joint  coordinates  of  the  robot.  Changing  the  position  of  the 
camera  or  refocusing  can  easily  change  this  correspondence.  We  thus  performed  a 
rough  calibration  of  the  camera  with  the  robot  before  each  assembly,  by  executing  a 
set  of  test  motions,  consisting  of  two  perpendicular  translations,  and  a  rotation  about 
a  joint  axis,  to  determine  the  mapping  between  the  group  of  image  motions  and  the 
associated  joint  commands.  The  calibration  was  therefore  very  approximate.  Indeed, 
part  of  the  motivation  was  to  determine  how  easily  one  could  place  the  peg  mto  the 
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hole  without  requiring  fine  precision  either  in  sensing  or  control.  It  is  thus  highly 
likely  that  the  calibration  contained  a  fixed  but  unknown  bias.  In  other  words,  even 
if  subsequent  sensing  was  perfect,  the  initial  calibration  error  probably  introduced 
an  unknown  error  into  the  suggested  motions.  Thus  it  would  be  highly  unreasonable 
to  expect  the  robot  to  insert  the  peg  into  the  hole  in  a  single  motion.  Additionally, 
there  are  sensing  errors  on  each  iteration.  For  instance,  the  light  below  the  hole 
causes  blooming.  This  means  that  the  image  edges  bulge  out  in  a  curved  fashion, 
thereby  introducing  error  into  the  observed  positions  of  the  peg  and  the  hole.  In 
short,  the  non-randomizing  portion  of  the  strategy  is  not  guaranteed  to  succeed  in  a 
specific  predictable  number  of  steps.  Instead,  the  full  randomized  strategy  operates 
as  a  simple  feedback  loop  that  eventually  succeeds.  This  will  be  explained  further 
below. 

A  more  serious  problem  arises  when  the  peg  is  near  the  hole.  In  this  case  t  lie 
camera  may  not  see  any  edges  on  either  the  peg  or  the  hole,  or  may  only  see  small 
fragments  that  it  cannot  reliably  match  to  the  peg  or  the  hole.  In  part  this  is  due  to 
the  placement  of  the  camera.  Inherently,  the  camera  will  be  offset  slightly  to  one  side 
or  the  other  of  the  assembly,  and  thus  will  not  always  be  able  to  see  the  hole.  For 
instance,  viewing  camera,  peg,  and  hole  in  terms  of  their  projections  into  the  plane 
of  assembly,  if  the  peg  is  situated  between  the  camera  and  the  hole,  then  the  camera 
may  not  be  able  to  see  any  edges.  Conversely,  if  the  peg  is  approaching  the  hole  from 
the  far  side  of  the  hole  relative  to  the  camera,  then  the  camera  will  likely  be  able 
to  detect  the  defining  edges  of  the  hole  and  the  peg  throughout  the  approach.  Thus 
there  are  preferred  approach  directions.  Of  course,  the  system  is  not  aware  of  these, 
just  as  it  is  not  aware  of  the  actual  biases  in  the  calibration  and  sensing  information. 


Randomization 

Now  consider  the  state  of  the  assembly  once  the  peg  is  near  the  hole,  supposing  that 
the  camera  cannot  determine  any  edges  with  which  to  suggest  a  next  motion.  In 
order  to  have  some  chance  of  attaining  the  goal,  the  system  must  make  a  motion. 
By  selecting  the  motion  randomly  the  system  can  avoid  any  deterministic  traps 
thai  might  result.  For  instance,  if  the  system  were  to  choose  a  motion  direction 
deterministically,  then  it  might  have  the  bad  fortune  of  moving  to  a  location  from 
which  the  sensors  would  direct  it  right  back  to  the  location  at  which  the  sensors 
provide  no  information.  Thus  the  system  would  be  stuck  in  a  loop.  By  choosing  the 
motion  direction  randomly,  the  system  can  break  out  of  such  a  loop.  So  long  as  there 
is  some  chance  of  attaining  the  goal,  with  probabilities  that  are  uniformly  bounded 
away  from  zero,  the  strategy  will  converge  eventually.  Indeed,  for  this  particular 
implementation  we  chose  the  maximum  step  size  of  the  random  motions  to  be  on 
the  order  of  2.5mm.  Thus  whenever  the  system  was  within  a  few  millimeters  of  the 
goal,  it  had  some  chance  of  attaining  the  goai  upon  execution  of  a  random  motion. 
The  camera  could  always  bring  the  peg  to  within  a  few  millimeters  of  the  hole.  More 
importantly,  however,  the  random  motions  permitted  the  system  to  enter  a  region 
from  which  the  biases  in  the  sensor-robot  calibration  and  in  the  placement  of  the 
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camera  actually  acted  in  favor  of  goal  attainment.  In  short,  the  randomizing  aspect 
could  actually  ferret  out  approach  directions  from  which  the  biases  were  helping  rather 
than  hindering  the  assembly.  This  is  an  important  property  of  randomized  strategies. 

Convergence  Regions 

For  this  particular  example  the  start  configurations  could  be  roughly  grouped  into 
four  regions  as  indicated  in  figure  1.7.  For  one  of  these  regions,  the  assembly  time  of 
the  strategy  was  very  fast,  namely  three  motions  on  average.  This  region  corresponds 
to  the  quadrant  that  was  diagonally  opposite  of  the  camera.  For  the  other  regions 
the  convergence  times  varied,  although  fourteen  motions  seems  to  have  been  a  rough 
average  (taken  over  fifty  trials).  We  often  observed  the  system  finding  its  way  into 
the  fast  region  with  the  aid  of  randomizing  motions,  then  quickly  attaining  the  goal. 

Analysis  of  the  Strategy 

Let  us  analyze  this  strategy  in  a  very  rough  and  approximate  fashion.  Suppose,  for 
the  sake  of  argument,  that  whenever  the  system  starts  in  the  lower  right  quadrant 
of  figure  1.7,  it  can  insert  the  peg  in  three  motions  on  average.  Experimentally,  two 
mourns  were  required  to  actually  insert  the  peg,  and  one  motion  to  recognize  that 
the  peg  had  been  inserted.  Suppose  further,  that  if  the  system  starts  in  any  of  the 
remaining  three  quadrants,  it  invariably  fails  to  insert  the  peg,  but  instead,  within 
two  motions,  places  the  peg  above  the  hole  in  such  a  manner  that  the  camera  cannot 
extract  any  useful  edges.  Whenever  this  happens,  the  system  executes  a  random 
motion,  and  tries  again.  For  simplicity  let  us  assume  that  the  random  motion  moves 
the  peg  into  any  of  the  four  quadrants  with  equal  probability.  Thus  the  probability  of 
moving  into  the  quadrant  from  which  fast  goal  attainment  is  possible  is  1/4.  In  other 
words,  the  expected  number  of  randomizing  motions  required  before  the  system  starts 
from  the  lower  right  quadrant  is  four.  Since  two  motions  are  executed  before  each 
randomizing  motion,  the  expected  number  of  sensor-based  and  randomizing  motions 
executed  until  the  goal  is  attained  is  approximately  (2  +  1)  *  4  +  3,  that  is,  15. 

Although  this  explanation  is  simplistic,  it  nonetheless  provides  an  explanation  of 
the  observed  data,  as  well  as  a  description  of  randomized  strategies  in  general.  The 
important  observation  is  that  a  randomized  strategy  is  not  a  guaranteed  strategy  in 
the  traditional  sense.  By  a  guaranteed  strategy  we  mean  a  set  of  possibly  conditional 
actions  that  are  certain  to  accomplish  a  specified  task  in  a  bounded  predetermined 
number  of  steps.  In  particular,  one  cannot  say  that  a  randomized  strategy  will 
succeed  in  a  fixed  predetermined  number  of  steps.  Rather,  the  strategy  runs  through 
a  sequence  of  operations  that  merely  provides  some  probability  of  success.  If  this 
sequence  is  repeatable  and  if  the  success  probabilities  sum  to  unity  over  an  infinite 
number  of  trials,  then  one  may  speak  of  eventual  convergence  of  the  randomized 
strategy.  Indeed,  one  may  even  be  able  to  compute  the  expected  number  of  steps 
until  convergence.  However,  one  cannot  generally  say  with  certainty  that  the  strategy 
will  succeed  on  any  particular  iteration. 
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Figure  1.7:  Four  start  regions  around  the  hole.  From  one  of  these,  the  biases  in  the 
system  permit  fast  peg  insertion.  From  the  others,  the  robot  either  attains  the  goal  or 
finds  its  way  via  randomizing  motions  into  the  region  from  which  fast  goal  attainment 
is  possible. 
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A  More  General  Problem 

The  previous  analysis  provides  a  rough  explanation  for  the  observed  behavior  of 
the  feedback  loop.  We  would  like  tools  for  analyzing  and  synthesizing  such  strategies 
more  precisely.  Most  of  the  rest  of  the  thesis  is  concerned  with  the  development  of  such 
tools.  Chapter  5  provides  a  detailed  analysis  of  a  simple  feedback  loop  for  attaining  a 
circular  region  in  the  plane.  This  problem  is  an  abstraction  of  the  translational  version 
of  the  peg-in-hole  problem  just  analyzed.  See  figure  1.8.  Recall  that  once  the  peg  has 
made  contact  with  the  surface  surrounding  the  hole,  then  the  only  motions  required 
to  move  the  peg  towards  the  hole  are  translations  and  rotations  in  the  plane.  This 
is  because  we  are  assuming  that  the  peg  is  aligned  properly  vertically.  If  the  peg  is 
actually  cylindrical,  then  only  translations  are  required.  The  peg  was  not  cylindrical 
in  our  implementation.  Nonetheless,  the  two-dimensional  feedback  strategy  analyzed 
in  chapter  5  provides  a  reasonable  abstraction  of  the  peg-in-hole  problem.  Higher 
dimensional  analyses  of  the  discussion  of  chapter  5  apply  more  generally. 

We  assume  also  that  the  system  can  recognize  when  the  peg  is  directly  above  or 
in  the  hole.  In  our  implementation  this  was  usually  possible  because  the  peg  would 
slightly  drop  into  the  hole  creating  a  very  narrow  slit  of  light  that  was  generally 
observed  only  when  the  peg  was  in  the  hole.  1 

Gaussian  Errors 

The  simple  feedback  strategy  will  be  analyzed  in  chapter  5  assuming  Gaussian  errors 
in  sensing  and  control.  Recall,  however,  that  the  strategy  itself  is  formulated  for  more 
general  types  of  errors.  Similar  to  the  implementation  of  the  peg-in-hole  example 
above,  the  feedback  strategy  of  chapter  5  operates  as  a  combination  of  sensing  and 
randomization.  Whenever  the  sensors  provide  information  useful  for  moving  towards 
the  goal,  then  the  strategy  executes  a  motion  guaranteed  to  move  closer  to  the 
goal.  Otherwise,  the  strategy  executes  a  random  motion.  As  we  will  see  later,  the 
randomization  has  a  natural  tendency  to  move  away  from  the  goal.  In  contrast,  the 
feedback  loop  uses  sensory  information  in  such  a  way  as  to  make  progress  towards 
the  goal.  However,  progress  is  not  always  possible  since  the  sensors  do  not  always 
provide  useful  information.  An  important  issue  therefore  is  to  determine  the  range  of 
locations  for  which  the  strategy  makes  progress  towards  the  goal  on  average.  As  we 
will  prove  in  chapters  3  and  5,  whenever  the  natural  motion  of  the  system  is  towards 
the  goal  on  average,  then  the  goal  is  attained  quickly. 

Figure  1.9  indicates  the  average  behavior  of  the  system  for  a  particular  set 
of  uncertainty  parameters.  In  particular,  the  sensing  error  is  an  unbiased  two- 
dimensional  Gaussian  distribution  with  standard  deviation  7/3.  The  qualitative  shape 
of  this  graph  applies  more  generally  to  different  uncertainty  parameters.  The  graph 
shows  the  expected  velocity  of  the  system  as  a  function  of  the  system’s  distance  from 
the  origin.  Recall  that  the  goal  is  a  circle  centered  at  the  origin.  A  negative  velocity 

‘During  the  course  of  some  fifty  trials  there  were  only  a  couple  of  occasions  when  the  system 
incorrectly  thought  that  it  had  placed  the  peg  in  the  hete. 
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Average  velocity  of  randonlzlng 
feedback  strategy 
for  2D  peg-ln-hole. 


(a  =  distance  fron  origin) 


Figure  1.9:  This  figure  shows  the  expected  radial  velocity  of  a  simple 
randomized  feedback  strategy  for  the  problem  of  moving  a  point  into  a 
circle,  as  in  figure  1.8.  That  problem  is  an  abstraction  of  the  peg-in-hole 
problem. 

The  expected  velocity  in  the  figure  is  positive  in  the  range  0  <  a  <  a0, 
and  negative  in  the  range  a  >  a0,  where  a0  «  3.  This  means  that  for 
starting  positions  that  are  closer  to  the  origin  than  3,  the  randomization 
component  of  the  strategy  naturally  pushes  the  system  away  from  the 
origin.  For  starting  locations  further  away  from  the  origin  than  3, 
the  sensing  information  is  good  enough  to  pull  the  system  towards  the 
origin  on  the  average.  This  says  that  a  goal  whose  radius  is  at  least  3 
would  be  attained  very  quickly. 

In  contrast,  it  turns  out  that  a  strategy  which  wishes  to  guarantee 
progress  towards  the  goal  on  each  step  can  do  so  only  if  the  goal  radius 
is  at  least  15.1.  In  short,  the  randomized  strategy  has  considerably 
better  convergence  properties  than  does  the  guaranteed  strategy. 

This  graph  and  the  number  15.1  will  be  derived  in  chapter  5.  The 
sensing  error  is  assumed  to  be  normally  distributed  with  standard 
deviation  7/3.  Similarly,  the  velocity  error  is  assumed  to  be  normally 
distributed  with  standard  deviation  1/6  times  the  magnitude  of  the 
commanded  velocity. 
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means  that  the  system  is  making  expected  progress  towards  the  origin.  In  particular, 
we  see  that  there  is  a  region  near  the  origin  for  which  the  natural  tendency  of  the 
system  is  to  move  away  from  the  origin.  Outside  of  this  region  the  system  moves 
towards  the  origin  on  the  average.  The  zero- velocity  point  is  given  by  approximately 
a0  =  3  in  the  figure.  Thus  if  the  goal  has  radius  bigger  than  Oo,  the  system  will 
quickly  converge  to  the  goal.  Even  if  the  goal  radius  is  smaller  than  a0,  the  system  will 
eventually  converge,  but  now  the  convergence  may  require  considerable  time.  Instead 
of  drifting  towards  the  goal  on  average,  the  system  attains  the  goal  eventually  due 
to  the  diffusion  character  of  the  feedback  loop.  Figures  5.8  through  5.10  on  pages 
259-261  indicate  the  expected  convergence  times  of  the  feedback  strategy  for  different 
starting  locations  and  different  goal  radii. 

An  important  observation  to  take  from  figure  1 .9  is  that  the  randomized  feedback 
loop  has  a  wider  convergence  range  than  would  a  guaranteed  strategy  for  attaining 
the  goal.  In  order  to  see  this,  let  us  simply  state  that  for  the  example  of  figure  1.9 
the  feedback  strategy  requires  a  sensory  observation  that  lies  at  least  distance  8.1 
from  the  origin  in  order  to  guarantee  progress  towards  the  goal.  Whenever  a  sensory 
observation  lies  closer  to  the  goal,  the  feedback  strategy  executes  a  random  motion.  In 
order  to  guarantee  that  the  only  sensor  values  observed  will  be  at  least  distance  8.1 
from  the  origin,  it  turns  out  that  the  system  must  be  at  least  distance  15.1  from  the 
origin.2  Thus,  a  planner  wishing  to  guarantee,  prior  to  execution  time,  that  progress 
towards  the  goal  will  be  made  consistently  at  execution  time  would  only  construct 
plans  for  goals  of  radii  larger  than  15.1.  On  the  other  hand,  the  randomized  feedback 
strategy  converges  to  goals  of  arbitrary  size.  Furthermore,  for  the  unbiased  Gaussian 
errors  used  to  derive  figure  1.9,  the  strategy  converges  quickly  for  goals  of  radii  as 
small  as  3.  This  is  because  the  expected  approach  velocity  points  towards  the  goal 
whenever  the  system  is  at  least  distance  3  from  the  origin. 


1.2  Further  Examples 

1.2.1  Threading  a  needle 

There  are  numerous  examples  of  manipulation  tasks  in  which  randomization  arises 
naturally.  For  instance,  consider  the  task  of  threading  a  needle.  Without  perfect 
control  and  perfect  sensing,  it  is  unlikely  that  one  can  thread  a  needle  on  a  specific 
try.  Nonetheless,  within  a  reasonable  starting  location  near  the  eye  of  the  needle, 
there  is  a  definite  chance  of  success  on  each  attempt  to  insert  the  thread,  so  that 
success  can  be  guaranteed  by  trying  repeatedly.  This  is  an  example  of  a  probabilistic 
action  around  which  a  loop  has  been  placed. 


2These  numbers  are  derived  form  a  particular  sensing  model  that  will  be  explained  in  more  detail 
in  the  rest  of  the  thesis.  See  in  particular  sections  2.2.3  and  5.2. 
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Figure  1.10:  Two  gears. 


1.2.2  Inserting  a  key 

Similar  examples  are  given  bv  tasks  such  as  inserting  a  key  into  a  lock  or  closing  a 
desk  drawer  that  is  jamming.  In  the  key-lock  task  the  solution  consists  of  moving  the 
key  near  the  keyhole,  then  moving  the  key  back  and  forth  if  necessary  while  reducing 
the  distance  to  the  hole,  until  the  key  actually  slides  into  the  keyhole.  Once  the  key 
is  in  the  lock,  one  may  have  to  jiggle  it  back  and  forth  while  pushing  in  order  to  fully 
insert  the  key.  The  example  of  closing  a  desk  drawer  is  similar  to  this  last  step.  If 
the  drawer  jams,  one  may  randomly  jiggle  it  while  pushing  inward,  to  overcome  any 
jamming  forces. 

1.2.3  Meshing  two  gears 

A  wonderful  example  is  given  by  the  task  of  meshing  two  gears  (see  figure  1.10). 
Donald  ([Don87b]  and  [Don89])  first  used  this  example  to  demonstrate  a  task  in  which 
solutions  cannot  be  guaranteed  but  for  which  there  is  some  hope  of  success.  His  thesis 
was  that  a  robot  should  attempt  to  solve  such  tasks,  so  long  as  at  the  end  of  each 
attempt  the  robot  is  able  to  distinguish  between  success  and  failure.  For  the  gear¬ 
meshing  ca^e,  should  the  success  not  be  directly  visible,  Donald  suggested  a  test  that 
consists  of  trying  to  rotate  one  gear.  If  the  other  gear  rotates  as  well,  with  the  proper 
gearing  ratio,  then  the  meshing  operation  is  known  to  have  completed  successfully. 
Otherwise,  it  has  failed.  In  the  context  of  the  randomized  actions  of  this  thesis,  the 
attempt  to  mesh  the  gears  will  play  the  role  of  a  non-deterministic  action,  around 
which  we  will  place  a  loop  who'  ?  active  randomization  guarantees  eventual  sucress. 
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In  order  to  get  a  flavor  of  the  approach,  consider  a  simplified  version  of  the  gear- 
meshing  problem  in  which  one  can  move  the  gears  towards  each  other  perfectly,  so 
that  the  centers  of  rotation  travel  on  the  straight  line  joining  them  (this  might  be 
possible  if  the  gears  are  mounted  on  a  telescoping  device  constraining  their  centers  of 
rotation).  As  the  gears  are  brought  near  each  other,  they  will  mesh  if  they  are  properly 
aligned.  In  other  words,  for  some  set  of  starting  orientations,  the  two  gears  will  mesh 
if  brought  together.  The  range  of  starting  orientations  that  permit  successful  meshing 
is  some  subset  of  the  two-dimensional  space  [0, 27 r]  x  [0, 27r]  that  describes  the  possible 
orientations  of  the  two  gears.  Suppose  that  one  cannot  sense  or  control  the  orientation 
of  the  gears  well  enough  to  be  able  to  ensure  that  the  gears  are  properly  oriented.  If 
initially  the  gears  are  randomly  oriented,  then  the  ratio  of  the  area  of  the  successful 
starting  range  to  47t2  is  the  probability  of  success  on  any  given  try.  A  randomized 
strategy  for  meshing  the  gears  consists  of  first  spinning  the  gears  to  achieve  a  random 
orientation,  then  bringing  them  together  in  an  attempt  to  mesh  them,  followed  by  a 
test  to  determine  whether  they  have  indeed  meshed  properly.  This  action  is  repeated 
until  the  test  indicates  that  the  gears  have  been  meshed.  The  expected  number  of 
attempts  until  success  is  simply  one  over  the  probability  of  success  on  a  given  try. 

This  example  raises  a  number  of  important  issues.  First,  let  us  consider  the 
probability  of  success  on  a  given  iteration.  In  order  to  specify  the  strategy  of  looping 
around  a  primitive  action  one  really  does  not  need  to  know  what  this  probability  of 
success  is.  It  is  sufficient  to  know  that  on  each  try  there  is  some  chance  of  success 
and  that  the  sum  of  the  probabilities  of  success  over  an  infinite  number  of  trials  is 
one.  For  instance,  it  is  sufficient  to  know  that  the  probability  of  success  on  each  trial 
is  larger  than  some  non-zero  constant. 

While  the  specification  of  the  strategy  does  not  depend  on  the  probability  of 
success,  it  is  nonetheless  sometimes  desirable  to  compute  this  probability,  either  to 
ascertain  that  it  is  non-zero  or  to  compare  it  with  other  possible  strategies.  This 
entails  computing  the  area  of  the  range  of  initial  orientations  that  permit  successful 
gear  meshing.  Figure  1.11  portrays  this  range  in  a  highly  approximate  fashion. 
Essentially  the  range  of  successful  initial  orientations  consists  of  a  set  of  diagonal 
stripes  in  the  space  of  orientations  of  the  two  gears.  The  number  of  stripes  depends 
on  the  number  of  gear  teeth,  and  the  inclination  of  the  stripes  depends  on  the  gearing- 
ratio.  The  center  axes  of  the  stripes  correspond  to  orientations  of  the  two  gears  at 
which  the  gears  are  perfectly  meshed.  The  stripes  themselves  include  orientations  at 
which  the  gears  are  not  perfectly  meshed,  but  from  which  the  gears  will  compliantly 
rotate  to  perfect  meshing  if  they  are  pushed  together.  Computing  the  exact  shape 
of  these  stripes  is  in  general  a  difficult  task,  which  depends  on  the  exact  geometry  of 
the  parts  and  on  the  coefficient  of  friction  between  them.  The  basic  idea  is  to  start 
from  a  goal  consisting  of  those  orientations  at  which  the  gears  are  perfectly  meshed, 
then  backchain,  recursively  determining  all  those  points  that  can  move  compliantly 
toward  the  goal  under  a  given  applied  force.  The  problem  is  complicated  by  the 
rotational  compliance  of  the  gears.  This  backchaining  process  is  known  as  computing 
preimages  or  backprojections  (see  [LMT],  [Mas84]  and  [Erd84]).  We  will  refer  to 
this  approach  as  the  LMT  preimage  planning  approach.  Donald  (see  [Don87b]  and 
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Figure  1.11:  Schematic  representation  of  the  range  of  initial  orientations  that  permit 
successful  gear-meshing.  These  are  indicated  by  the  hatched  areas.  [The  figure 
corresponds  to  gears  with  only  four  teeth.  Realistic  gears  generate  more  stripes.] 


[Don89])  has  investigated  approximate  techniques  for  computing  such  backprojections 
in  the  gearing  case.  We  will  not  examine  those  techniques  here.  Instead,  we  will 
convey  the  basic  idea  of  how  one  might  compute  success  probabilities  with  a  slightly 
simpler  example. 

If  we  fix  the  orientation  of  one  of  the  gears,  the  successful  starting  orientations  of 
the  other  gear  form  a  periodic  pattern  of  disconnected  intervals.  This  pattern  looks 
very  much  like  a  sieve,  and  indeed  we  can  think  of  the  gear  as  a  sieve  that  filters  out 
bad  orientations  of  the  other  gear  or,  more  generally,  improperly  shaped  gears.  Let  us 
look  then  at  a  sieve  to  demonstrate  in  a  simpler  setting  the  ideas  behind  randomized 
strategies. 

Figure  1.12  shows  a  simple  grating  that  acts  like  a  one-dimensional  sieve, 
permitting  some  two-dimensional  objects  through  but  not  others.  Let  us  suppose 
that  the  object  we  would  like  to  get  through  the  sieve  is  a  square,  as  shown  in  figure 
1.13.  Assume  the  object  can  only  translate,  not  rotate.  Relative  to  the  indicated 
reference  point  on  the  object,  the  translational  constraints  imposed  by  the  sieve  on 
the  object  are  as  shown  in  figure  1.14.  This  is  the  configuration  space  (see  [Loz8l] 
and  [Loz83])  representation  of  the  sieve.  Moving  the  object  through  the  real  sieve 
corresponds  directly  to  moving  a  point  through  this  configuration  space  sieve.  The 
representation  depends  of  course  on  the  object  being  moved.  Inuet-d  for  objects  that 
are  too  larg’,  the  configuration  space  sieve  is  simply  a  soli  horizontal  slab,  indicating 
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Motion  of  objects 
through  sieve 

r 

Figure  1.12:  A  grating  of  two-dimensionai  obstacles.  The  grating  acts  as  a 
one-dimensional  sieve,  only  allowing  objects  small  enough  to  move  through  the  sieve. 


that  the  object  cannot  be  moved  through  the  sieve. 

Given  the  configuration  space  sieve,  we  are  now  in  a  position  similar  to  the  gear¬ 
meshing  example.  In  particular,  let  us  suppose  that  the  analogue  to  moving  the  gears 
together  consists  of  translating  the  object  vertically  downward  (for  instance,  under 
the  influence  of  gravity).  Then,  for  certain  starting  configurations,  the  object  will 
translate  through  the  sieve,  while  for  other  configurations  it  will  become  stuck  on  the 
sieve  elements.  Thus  the  sieve  also  acts  as  a  configuration  sieve,  filtering  out  certain 
initial  starting  configurations  of  the  object.  Of  course,  that  is  not  exactly  the  purpose 
of  the  sieve.  After  all,  one  would  like  the  object  to  translate  through  the  sieve.  In 
order  to  ensure  this,  one  shakes  the  sieve,  or  equivalently,  one  randomizes  the  initial 
position  of  the  part.  This  operation  corresponds  to  the  act  of  twirling  the  gears  in 
order  to  randomize  their  configurations  on  each  meshing  attempt. 

Let  us  compute  the  probability  of  success  for  the  sieve  example.  First,  let  us 
assume  that  there  is  no  control  uncertainty,  so  that  the  object  translates  straight 
down  when  commanded  to  do  so.  Figure  1.15  portrays  those  start  configurations  from 
which  the  object  is  guaranteed  to  pass  through  the  sieve  when  translating  downward 
(recall  that  the  part  is  represented  by  a  point  in  its  configuration  space).  Suppose  that 
the  sieve  is  periodic  and  unbounded,  and  suppose  further  that  the  start  configuration 
of  the  object  is  uniformly  distributed.  One  then  sees  that  the  probability  of  success 
is  simply  the  ratio  of  the  length  of  a  hole  in  an  elemental  period  of  the  sieve  to  the 
full  length  of  the  elemental  period. 

In  the  previous  computation,  it  was  enough  to  look  at  one-dimensional  quantities 
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Solid  Sieve  Element 


Object  to  move  through  sieve 
reference  point 


Sieve  Element  in 
configuration  space 


Figure  1.13:  This  figure  shows  the  constraints  imposed  on  the  motion  of  the  square 
by  the  trapezoidal  sieve  element.  The  bottom  polygon  describes  the  locations  of  the 
reference  point  of  the  square  for  which  there  would  be  contact  between  the  square 
and  the  trapezoid. 


Figure  1.14:  This  figure  shows  the  configuration  space  sieve  corresponding  to  the  real 
space  sieve  of  figure  1.12  for  the  motion  of  a  square  as  in  figure  1.13. 
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Figure  1.15:  Perfect  velocity  preimage.  For  starting  locations  in  the  shaded  area, 
the  system  is  guaranteed  to  pass  though  the  sieve  by  moving  straight  down.  For 
other  locations,  the  system  will  get  stuck  on  a  horizontal  edge.  If  the  starting 
location  is  uniformly  distributed,  then  the  probability  of  passing  through  the  sieve  is 
approximately  a/6. 
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Probability  of  success  is  at  least  A/B 

Figure  1.16:  Preimage  assuming  velocity  error.  For  starting  locations  in  the  shaded 
area,  the  system  is  guaranteed  to  pass  though  the  sieve,  given  the  velocity  error  cone. 
For  other  locations,  the  system  may  get  stuck  on  a  horizontal  edge.  If  the  starting 
location  is  uniformly  distributed  in  the  infinite  horizontal  strip,  then  the  probability 
of  passing  through  the  sieve  is  at  least  Aj B. 
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in  computing  the  probability  of  success,  since  the  vertical  coordinate  of  a  point  above 
the  sieve  did  not  matter  in  determining  whether  the  point  would  translate  through 
the  sieve.  However  in  general  one  needs  to  compute  the  ratio  of  the  area  of  successful 
starting  configurations  to  the  area  of  possible  starting  configurations.  Suppose,  for 
instance.,  that  whenever  a  translation  is  commanded  the  actual  motion  lies  within 
some  velocity  error  cone  about  the  nominal  commanded  velocity.  Then  the  set  of 
initial  configurations  from  which  translation  through  the  sieve  is  guaranteed  changes. 
Indeed,  the  successful  starting  configurations  are  delineated  by  triangles  that  are 
determined  by  erecting  the  velocity  error  cone  above  the  sieve  holes,  as  shown  in 
figure  1.16.  Suppose  that  the  initial  configuration  of  the  object  is  known  to  lie  in 
some  region,  uniformly  distributed.  Consider  those  portions  of  the  triangles  that 
lie  within  this  starting  region,  and  sum  up  the  areas  of  these  portions.  Then  the 
probability  of  success  is  given  by  the  ratio  of  this  area  to  the  full  area  of  the  starting 
region.  This  computation  is  also  indicated  in  the  figure  for  a  periodic  sieve  with  a 
periodic  starting  region. 

Actually,  the  probability  thus  computed  is  an  underestimate.  This  is  because 
the  probability  is  determined  only  by  considering  configurations  from  which  passage 
through  the  sieve  is  guaranteed,  independent  of  the  actual  motion  taken  within  the 
velocity  error  cone.  Such  regions  are  known  as  strong  preimages  (see  [LMT]).  It  is. 
however,  also  possible  that  some  points  that  lie  outside  of  these  strong  preimages 
may  for  some  possible  error  velocity  pass  through  the  sieve.  However,  since  this 
passage  cannot  be  guaranteed,  without  further  information,  one  cannot  say  anything 
about  how  the  possibility  of  success  for  these  start  configurations  affects  the  total 
probability  of  success.  If  the  probability  distribution  of  the  velocity  errors  is  known, 
then  it  can  be  used  to  compute  an  additional  term  that  figures  into  the  probability  of 
success.  Without  such  knowledge,  however,  we  can  imagine  that  no  point  outside  of 
the  strong  preimages  ever  passes  through  the  sieve,  and  thus  our  original  probability 
computation  is  the  best  possible  lower  bound. 

Another  issue  raised  by  these  examples  concerns  the  need  for  randomization.  We 
will  discuss  this  issue  further  in  the  next  section,  but  let  us  briefly  consider  the 
question  of  randomization  in  the  context  of  the  gear  and  sieve  examples.  One  might 
wonder  why  it  is  ever  necessary  to  randomize  the  start  configuration  of  a  part,  as 
opposed  to  deterministically  searching  the  set  of  possible  start  locations.  For  instance, 
in  the  gear  example,  even  if  the  orientations  of  the  gears  are  not  measurable  well 
enough  to  ensure  proper  initial  alignment,  one  could  imagine  rotating  one  or  both  of 
the  gears  slightly  after  each  meshing  attempt,  then  retrying.  If  the  rotation  is  small 
enough,  then  this  process  should  eventually  encounter  a  starting  orientation  from 
which  successful  meshing  is  possible.  Unfortunately,  there  are  some  problems  with  this 
approach.  First,  it  may  be  impossible  to  rotate  the  gears  finely  enough  to  guarantee 
that  the  rotation  will  not  just  jump  over  the  successful  start  orientation.  And  second, 
after  a  failed  meshing  attempt  the  configuration  of  the  gears  will  have  changed,  so  that 
it  is  not  at  all  clear  that  incremental  rotations  will  eventually  encounter  a  successful 
starting  configuration.  In  principle,  the  system  could  get  into  a  loop,  starting  from 
a  given  unsuccessful  orientation,  rotating  during  the  failed  attempt  to  an  orientation 
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exactly  offset  from  the  start  orientation  by  the  angle  of  increment,  thus  ensuring  that 
the  new  start  orientation  after  incrementing  is  again  the  old  start  orientation,  and 
so  forth.  Of  course,  if  one’s  predictive  capabilities  are  good  enough,  one  could  detect 
the  potential  for  such  a  loop,  but  that  is  not  always  the  case.  A  straightforward 
method  of  avoiding  the  possibility  of  a  deterministic  loop  is  to  randomize  the  initial 
conditions.  We  will  discuss  this  approach  in  more  detail  in  the  next  section. 

There  is  another  reason  for  randomizing,  which  again  relates  to  the  accuracy  with 
which  one  can  model  the  world.  In  the  sieve  example,  for  instance,  the  spacing 
between  sieve  elements  may  be  slightly  non-uniform,  so  that  one  cannot  predict 
exactly  where  a  hole  will  be.  Taken  over  a  large  segment  of  the  sieve,  the  density  of 
holes  to  non-holes  may  be  the  same  as  in  the  uniform  case,  but  it  may  vary  locally. 
Thus  it  may  make  sense  to  randomize  the  start  location  to  take  advantage  of  the  high 
overall  probability  of  success,  avoiding  possibly  low  local  probabilities  of  success.  Let 
us  make  this  argument  more  precise.  Suppose  that  in  a  perfectly  shaped  sieve,  the 
period  of  the  sieve  has  length  6,  of  which  length  a  is  free  space,  and  length  b  —  a  is 
an  obstacle.  Thus  the  probability  of  success  (assuming  perfect  control)  is  a/6.  See 
again  figure  1.15.  Now  suppose  that  the  sieve  is  not  built  very  well.  Instead  there 
are  two  types  of  sieve  sections.  In  Type  One  sections  the  hole  has  size  a  +  e,  while  in 
Type  Two  sections  the  hole  has  size  a  —  e,  where  t  is  some  positive  number  satisfying 
0  <  c  <  min{a,  6  —  a}.  Suppose  that  the  underlying  period  of  the  sieve  still  has  size  6, 
and  that  the  two  types  of  sieve  sections  occur  with  equal  frequency  when  viewed  over 
the  entire  sieve,  although  locally  one  or  other  type  may  dominate.  If  the  state  of  the 
system  happens  to  be  in  a  region  in  which  there  are  only  Type  One  sieve  sections, 
then  the  probability  of  success  is  (a  +  e)/6,  whereas  if  the  system  happens  to  be  in 
a  region  in  which  there  are  only  Type  Two  sections,  then  the  probability  of  success 
is  (a  —  t)/b.  If  e  is  close  to  a,  then  the  probability  of  success  might  be  very  near 
zero  if  the  system  is  in  this  second  region.  However,  if  the  system  first  randomizes  'ts 
initial  position,  so  that  it  starts  off  with  a  uniformly  chosen  initial  position,  then  the 
resulting  probability  of  success  is  given  again  by  a/6. 

We  will  discuss  a  related  example  in  section  2.4.  Another  related  example  is  given 
by  a  person  trying  to  open  a  door  in  the  dark.  Suppose  he  has  n  keys  of  which  k 
will  open  the  door.  If  he  tries  the  keys  in  order,  in  the  worst  case  he  may  need  to 
try  n  —  k  +  1  keys  before  success,  but  if  he  tries  them  randomly  (with  replacement), 
then,  although  the  worst  case  is  now  unbounded,  the  expected  number  is  n/k.  If  n 
and  k  are  large  and  k  is  comparable  to  n,  but  still  considerably  less  than  n,  then 
it  makes  sense  tc  try  the  randomized  approach.  This  is  essentially  the  motivation 
behind  the  use  of  probabilistic  algorithms  in  computer  science.  If  k  is  small,  then  the 
deterministic  approach  is  preferable.  However,  even  in  this  case,  if  the  deterministic 
approach  is  subject  to  failure,  in  that  the  person  may  drop  the  keys  or  forget  which 
keys  he  has  already  tried,  then  the  ra.  domized  approach  is  again  useful. 

To  summ?.rize,  randomization  is  useful  in  two  ways.  First,  randomization  foils 
an  adversarial  world  that  might  cause  a  deterministic  search  to  loop.  Second, 
randomization  may  compensate  for  imperfect  world  knowledge,  by  ensuring  that 
successful  actions  are  taken  at  least  occasionally,  and  in  some  cases  by  ensuring  that 
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successful  actions  are  taken  with  high  enough  frequency. 

1.3  Why  Randomization? 

The  main  purpose  of  randomization  is  to  increase  the  class  of  solvable  tasks.  In 
particular,  randomized  strategies  are  useful  for  solving  tasks  for  which  there  is  no 
guaranteed  solution  but  for  which  there  is  some  probability  of  success.  A  guaranteed 
solution  in  this  context  refers  to  a  strategy  consisting  of  a  set  of  possibly  conditional 
actions  that  are  certain  to  accomplish  a  task  in  a  bounded  predetermined  amount  of 
time.  In  contrast,  a  randomized  strategy  is  expected  only  to  attain  the  goal  in  some 
expected  amount  of  time.  While  giving  up  predetermined  convergence,  randomized 
strategies  provide  a  tool  for  solving  a  broader  class  of  tasks. 

Randomization  also  increases  the  class  of  solvable  tasks  by  reducing  the  demands 
made  on  modelling  and  prediction.  For  instance,  in  the  peg-in-hole  task  at  the 
beginning  of  this  chapter,  we  were  not  required  to  model  very  accurately  the  errors 
introduced  by  the  calibration  process.  More  generally,  one  can  imagine  tasks  in 
which  geometrical  errors  in  the  modelling  of  parts  prevent  guaranteed  solutions.  For 
instance,  there  might  exist  slight  nicks  and  bumps  on  the  hole  surface,  which  could 
prevent  successful  entry  of  the  peg  into  the  hole.  In  general,  it  is  very  difficult  to  plan 
explicitly  for  such  irregularities.  However,  for  a  large  class  of  such  irregularities  the 
system  can  avoid  becoming  permanently  stuck  by  wiggling  the  peg  slightly,  that  is, 
by  introducing  randomized  motions. 

Reducing  the  demands  on  modelling  and  prediction  also  permits  simpler  solutions 
to  tasks  for  which  there  might  actually  exist  guaranteed  strategies.  In  addition, 
reducing  the  knowledge  requirements  of  a  strategy  reduces  its  brittleness. 

One  question  remains.  It  deals  with  the  difference  between  active  randomization 
and  probabilistic  or  non-deterministic  actions.  In  the  context  of  this  thesis,  to  say  that 
a  strategy  or  an  action  is  probabilistic  is  to  say  that  it  has  some  non-zero  probability 
of  success,  but  may  not  be  guaranteed  to  succeed.  More  formally,  an  action  is 
probabilistic  if  its  effect  on  each  state  is  modelled  as  a  set  of  configurations,  each 
of  which  has  some  non-zero  probability  of  occurring.  Often  a  probabilistic  strategy 
will  consist  of  some  loop  around  a  probabilistic  action,  the  purpose  of  the  loop  being 
to  guarantee  eventual  convergence. 

More  generally,  a  strategy  or  action  is  said  to  be  non- deterministic  if  its  outcome  is 
modelled  as  a  set  of  possible  configurations.  The  non-deterministic  model  is  intended 
as  a  worst-case  model.  It  says  simply  that  an  action  might  cause  a  transition  to  any 
one  of  a  set  of  possible  configurations.  However,  nothing  is  said  about  the  actual 
likelihood  of  that  transition  occurring. 

While  an  action  may  be  probabilistic,  the  decision  to  execute  that  action  is  often 
deterministic.  In  other  words,  given  certain  sensor  values,  the  system  selects  a  certain 
action  in  a  completely  deterministic  fashion.  It  is  simply  the  outcome  of  the  action 
that  is  probabilistic  or  non-deterministic.  An  alternate  approach  is  for  a  system  to 
actively  make  random  choices  in  selecting  actions.  This  process  is  what  we  have  been 
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calling  randomization. 

We  have  already  indicated  in  the  sieving  example  the  usefulness  of  randomization. 
However,  it  may  not  be  clear  why  randomization  is  ever  really  required.  After  all,  one 
could  imagine  that  a  system  could  simulate  in  a  deterministic  fashion  a  randomizing 
system,  simply  by  enumerating  in  some  order  all  possible  random  decisions  of  the 
randomizing  system,  until  the  goal  is  attained. 

One  possible  benefit  of  active  randomization  might  be  improved  convergence 
times.  There  are  certainly  arguments  from  the  theory  of  randomized  algorithms 
that  suggest  that  randomization  can  speed  up  convergence  of  certain  tasks.  Indeed, 
we  will  exhibit  an  example  in  chapter  3  for  which  randomization  does  speed  up 
convergence.  However,  the  problem  here  is  slightly  different,  in  essentially  three  ways. 
First,  unlike  decision  problems  in  algorithms,  when  moving  in  the  physical  world  one 
cannot  arbitrarily  restart  the  problem  to  improve  convergence.  For  instance,  for 
decision  problems  in  Bounded  Probabilistic  Polynomial  time,  one  can  repeatedly  ask 
the  decision  question,  thereby  making  the  probability  of  error  as  small  as  desired. 
Furthermore,  this  may  be  done  in  a  polynomial  amount  of  time.  In  contrast,  once  a 
robot  has  moved,  it  may  have  introduced  uncertainty  into  its  configuration,  and  thus 
may  not  be  able  to  restart  from  the  same  location  should  it  fail  to  attain  its  goal.  To 
some  extent  one  can  define  this  issue  away,  by  insisting  that  it  be  possible  to  place 
a  loop  around  any  probabilistic  sequence  of  actions.  However,  the  basic  difference 
remains.  Second,  many  robot  planning  problems  in  the  presence  of  uncertainty 
are  at  least  PSPACE-hard  (see  [Nat88],  [CR],  and  [Can88]).  Thus  the  hope  for 
polynomial  speedup  by  moving  to  probabilistic  algorithms  seems  futile  in  general 
(see  [Gill]).  Third,  our  main  interest  lies  in  extending  the  class  of  solvable  tasks, 
with  performance  issues  entering  as  a  secondary  motivation.  Thus  the  question  of 
whether  randomization  is  ever  required  enters  at  the  level  of  task  solvability  rather 
than  purely  at  the  level  of  convergence  time. 

The  need  for  randomization  arises  in  the  context  of  non-deterministic  actions. 
When  actions  are  probabilistic,  one  can,  at  least  in  principle,  compare  different 
decisions  based  on  their  probability  of  success,  then  select  that  decision  which 
maximizes  the  probability  of  success.  No  randomization  is  required.  However,  in 
the  setting  of  non-deterministic  actions,  one  must  be  prepared  to  handle  worst-case 
scenarios.  This  means  that  one  should  view  uncertainty  as  an  adversary  who  is 
trying  to  foil  the  system’s  strategy  for  attaining  the  goal,  and  who  will  therefore 
always  choose  that  outcome  of  a  non-deterministic  action  that  prevents  the  system 
from  attaining  its  goal.  Again,  it  may  seem  that  one  can  enumerate  all  decisions  and 
actions,  then  select  that  sequence  of  actions  that  is  guaranteed  to  attain  the  goal 
despite  the  most  devilish  adversary.  Indeed,  this  is  the  approach  taken  in  planning 
systems  that  generate  guaranteed  plans,  that  is,  plans  guaranteed  to  attain  the  goal  in 
a  predetermined  bounded  number  of  steps.  However,  not  all  tasks  admit  to  guaranteed 
solutions.  The  interest  of  this  thesis  is  in  tasks  for  which  there  may  not  exist  any 
guaranteed  plan,  or  for  which  finding  a  guaranteed  plan  may  be  very  difficult.  In 
that  setting  randomization  can  play  a  useful  role,  in  that  it  can  prevent  an  adversary 
from  forev.  foiling  the  goal-attaining  strategy.  We  should  note  that  there  is  a  tacit 
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Figure  1.17:  This  is  a  state  graph  with  non-deterministic  actions.  There  is  no 
guaranteed  strategy  for  attaining  the  goal  if  the  state  of  the  system  is  unknown. 
However,  by  randomly  and  repeatedly  executing  one  of  the  actions  A\  or  A2,  the  goal 
is  attained  in  two  steps  on  average. 


assumption  here  that  nature,  that  is,  the  adversary,  cannot  control  or  observe  the 
dice  used  to  make  the  randomizing  decisions.  We  now  demonstrate  the  usefulness  of 
randomization  with  a  very  simple  example. 

Imagine  a  discrete  three-state  system,  as  shown  in  figure  1.17.  There  is  one  goal 
state  G,  and  two  other  states  labelled  as  state  Si  and  state  s2.  Additionally,  there 
are  two  actions,  A!  and  A2,  that  have  non-deterministic  outcomes.  If  the  system  is 
in  state  si  then  action  A]  is  guaranteed  to  move  the  system  to  the  goal.  However, 
action  A2  will  non-deterministically  move  the  system  from  Sj  either  back  to  si  or  to 
the  other  state  s2.  Similarly,  if  the  system  is  in  state  s2,  then  action  A2  is  guaranteed 
to  attain  the  goal,  while  action  A\  will  non-deterministically  either  remain  in  s2  or 
move  to  state  Si.  Suppose  that  the  only  sensing  available  is  goal  recognition.  In  other 
words,  the  system  can  detect  goal  attainment,  but  cannot  decide  whether  it  is  in  state 
S]  or  s2.  We  observe  that  there  is  no  guaranteed  strategy  for  attaining  the  goal.  For 
any  deterministic  sequence  of  actions  there  is  some  interpretation  of  the  diagram  for 
which  the  sequence  fails  to  achieve  the  goal.  Said  differently,  from  a  worst-case  point 
of  view,  no  finite  or  infinite  deterministic  strategy  is  guaranteed  to  attain  the  goal. 

As  an  example,  consider  the  sequence  of  actions  Aj;  Ah;  A2;  A\\ A2;  A2.  The 
following  is  a  possible  sequence  of  transitions  that  fails  to  attain  the  goal. 

A\  A\  A2  A\  A2  A2 
s2  - ►  s2  - ►  Si  - ►  s2  - ►  s i  - ►  Si  - >  <  whatever  >  . 

In  order  to  prove  that  there  is  no  guaranteed  strategy  for  attaining  the  goal, 
imagine  an  adversary  who  can  look  ahead  to  the  next  action  A,,  and  u:  3  the  current 
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action  to  either  stay  in  the  current  state  or  move  to  the  other  non-goal  state.  In 
particular,  the  adversary  can  always  move  to  state  Sj ,  with  j  i,  where  the  index  : 
is  determined  by  the  action  A,.  (Here  both  i  and  j  are  either  1  or  2.) 

The  introduction  of  an  adversary  is  just  a  proof  artifice  of  course.  There  is  no 
need  to  actually  have  someone  look  at  a  purported  strategy  for  attaining  the  goal. 
The  point  is  that  even  without  an  adversary,  the  transition  diagram  might  behave  as 
if  there  were  such  an  adversary,  for  any  fixed  deterministic  strategy.  For  instance, 
the  transition  diagram  might  be  the  visible  portion  of  a  considerably  more  complex 
machine  or  natural  process,  whose  transitions  govern  the  apparently  non- deterministic 
transitions  of  Aj  and  A2. 

One  question  one  might  ask  is  how  complex  such  a  hidden  state  diagram  must  be 
to  foil  a  particular  deterministic  strategy.  In  particular,  if  one  limits  the  complexity 
of  the  hidden  diagram,  then  sufficiently  long  deterministic  strategies  will  eventually 
attain  the  goal.  We  will  not  delve  into  this  question. 

Another  question  concerns  the  importance  of  the  term  “fixed”.  If  one  varies 
the  strategy  then  one  increases  the  likelihood  of  obtaining  a  guaranteed  strategy. 
Of  course,  varying  a  deterministic  strategy  in  a  deterministic  manner  yields  another 
deterministic  strategy.  Instead,  suppose  that  one  varies  the  strategy  by  randomizing. 
Then  we  see  that  there  exists  a  randomized  strategy  whose  expected  convergence 
time  is  very  low,  namely  two  steps.  This  strategy  randomly  chooses  between  actions 
A\  and  A2  on  each  step,  choosing  each  action  with  probability  1/2.  Since  the  system 
is  in  some  state  st,  the  strategy  will  choose  the  correct  action  A,  for  that  state  with 
probability  1/2.  This  is  true  independent  of  the  behavior  of  the  system.  Thus,  by  a 
waiting  time  argument,  the  expected  time  until  the  system  guesses  the  correct  action 
is  two.  In  turn  this  says  that  the  expected  time  until  the  goal  is  attained  is  no 
greater  than  two.  [It  may  actually  be  less  than  two,  if  the  underlying  transitions  are 
themselves  probabilistic  rather  than  adversarial.]  This  example  shows  clearly  how 
randomization  can  solve  tasks  for  which  there  are  no  guaranteed  strategies,  and  for 
which  no  deterministic  simulation  of  the  randomization  is  guaranteed  to  solve  the 
task. 

The  argument  of  the  example  above  is  essentially  a  worst-case  versus  expected- 
case  analysis.  It  may  seem  strange  to  compare  worst  and  expected  cases.  However, 
there  are  two  important  observations  to  take  from  this  example.  First,  there  is  a 
major  advantage  to  be  gained  by  considering  the  expected  case  rather  than  the  worst 
case.  This  is  because  the  task  of  attaining  the  goal  is  solvable  only  in  the  expected 
case,  not  in  the  worst  case.  Second,  the  expected  case  convergence  time  is  computed 
over  randomizing  decisions  actively  made  by  the  run-time  system,  not  over  externally 
defined  probability  distributions.  In  particular,  the  system  has  control  over  this 
expectation  on  any  attempt  to  complete  the  task.  It  is  not  an  expectation  computed 
over  different  possible  world  models  of  the  actions  Ai  and  A2.  Rather  the  upper 
bound  on  the  expectation  applies  for  every  possible  interpretation  of  the  underlying 
non-deterministic  model. 

As  a  final  comment,  let  us  observe  that  often  probabilistic  actions  may  have  the 
same  effect  as  active  randomization  on  the  system’s  part.  For  instance,  if  the  non- 
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deterministic  transitions  of  A\  and  A2  were  probabilistic,  with  transition  probabilities 
1  /2,  then  the  system  could  simply  execute  action  A\  repeatedly.  No  randomization 
would  be  required,  since  the  physics  of  the  problem  would  effectively  provide  the 
required  randomization.  If  the  system  originally  started  from  state  sl5  the  strategy 
would  succeed  in  a  single  step,  whereas  if  the  system  started  from  state  s2,  the  strategy 
would  succeed  in  the  expected  time  of  three  steps. 


1.4  Previous  Work 

Work  on  planning  in  the  presence  of  uncertainty  goes  back  in  time  as  far  as  one  can 
imagine.  Credit  for  the  modern  approach  probably  goes  to  Richard  Bellman  [Bell], 
who  formulated  the  dynamic  programming  approach  that  underlies  much  of  optimal 
control  and  decision  theory.  His  ideas  were  themselves  based  to  some  extent  on  the 
calculus  of  variations  and  game  theory.  See  [Bert]  for  an  introduction  to  dynamic 
programming  in  the  discrete  domain,  and  see  [Stengel]  for  an  overview  of  techniques 
in  optimal  control. 

1.4.1  Uncertainty 

Within  the  domain  of  robotics,  uncertainty  has  always  been  a  central  problem. 
Much  of  the  work  on  compliant  motion  planning  was  motivated  by  a  desire  to 
compensate  for  uncertainty  in  control  and  inaccuracies  in  the  modelling  of  parts. 
The  aim  was  to  take  advantage  of  surface  constraints  to  guide  assembly  operations. 
Inoue  [Inoue]  used  force  feedback  to  perform  peg-in-hole  assembly  operations  at 
tolerances  below  the  inherent  positional  accuracy  of  his  manipulator.  Simunovic 
(see  [Sim75]  and  [Sim79])  considered  both  Kalman  filtering  techniques  in  position 
sensing  and  the  use  of  force  information  to  guide  assembly  operations  in  the  presence 
of  uncertainty.  In  conjunction  with  this  work  there  grew  an  interest  in  friction  and 
the  modelling  of  contact  to  describe  the  possible  conditions  under  which  an  assembly 
could  be  accomplished  successfully.  See  [NWD],  [Drake],  [OHR],  [OR]  and  [Whit82]. 
More  recent  work  with  an  emphasis  on  understanding  three-dimensional  peg-in-hole 
assemblies  in  the  presence  of  friction  and  uncertainty  includes  [Caine]  and  [Sturges]. 

1.4.2  Compliance 

The  formalization  and  understanding  of  compliant  motion  techniques  received  several 
major  boosts.  Whitney  [Whit77]  introduced  the  notion  of  a  generalized  damper  as  a 
way  of  simplifying  the  apparent  behavior  of  a  system  at  the  task  level.  The  generalized 
damper  is  a  first-order  description  of  a  system.  A  zeroth-order  description  is  given 
by  a  generalized  spring.  In  this  direction,  Salisbury’s  [Sal]  work  on  generalized 
springs  provided  a  means  of  stiffness  control  for  six  degrees  of  freedom.  Several 
researchers  considered  a  form  of  control  known  as  hybrid  control  (see  the  article 
[Mas82b]  for  a  pointer  to  these  various  researchers,  and  more  generally  the  book 
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[BHJLM]).  The  work  of  Mason  [Mas8l]  contributed  to  the  understanding  of  compliant 
motions  by  modelling  and  analyzing  compliance  in  configuration  space.  In  particular, 
he  introduced  and  formalized  the  ideas  of  hybrid  control,  showing  how  these  could 
be  modelled  natural/  on  surfaces  in  configuration  space.  The  basic  approach  is  to 
maintain  contact  with  an  irregular  and  possibly  unknown  surface,  by  establishing  a 
force  of  contact  normal  to  the  surface,  while  position-controlhng  directions  tangential 
to  the  surface  of  contact.  In  short,  uncertainty  is  overcome  in  some  dimensions. 
Raibert  and  Craig  [RC]  describe  a  combination  of  position  and  force  control  in  their 
implementation  of  a  hybrid  control  system.  See  also  [Inoue]  and  [PS]  for  earlier  work 
on  hybrid  control. 

1.4.3  Configuration  Space  and  Motion  Planning 

The  notion  of  configuration  space  was  introduced  into  robotics  by  Lozano-Perez  (see 
[Loz81]  and  [Loz83]),  as  a  means  of  characterizing  a  robot’s  degrees  of  freedom 
and  the  constraints  imposed  on  those  degrees  of  freedom  by  objects  in  the  world. 
A  point  in  configuration  space  corresponds  to  a  configuration  of  the  robot  in  real 
space.  Thus  configuration  space  is  a  means  of  transforming  a  complicated  motion 
planning  problem  into  the  problem  of  planning  the  motion  of  a  point  in  a  (possibly) 
higher-dimensional  space  whose  axes  are  the  robot’s  degrees  of  freedom.  The  roots 
of  these  ideas  may  be  found  in  [Udupa],  who  transformed  the  problem  of  moving 
a  robot  among  a  set  of  obstacles  into  the  problem  of  moving  a  point  among  a  set 
of  transformed  obstacles.  See  also  [Loz76],  who  used  configuration  space  in  the 
context  of  grasping  parts.  The  motivation  for  configuration  space  was  initially  to 
solve  the  obstacle  avoidance  problem.  In  paxticular,  the  configuration  space  of  an 
object  provides  a  geometric  description  of  the  set  of  collision- free  configurations  of 
the  object,  and  thus  the  basis  for  planning  algorithms.  Much  work  has  occurred  in 
obstacle  avoidance  since  then;  sc  below  for  a  partial  list. 

An  important  observation  made  by  Mason’s  paper  [Mas81]  is  that  configuration 
space  possesses  dynamic  properties  as  well  as  purely  kinematic  properties.  Thus  the 
normals  to  configuration  space  surfaces  have  dynamic  significance.  In  particular, 
one  can  push  on  a  configuration  space  surface  and  experience  a  reaction  force. 
This  observation  meant  that  hybrid  conti  I  could  be  viewed  nicely  in  configuration 
space.  Additionally,  the  dynamic  information  of  configuration  space  was  later  used 
by  Erdmann  [Erd84]  to  model  friction  in  configuration  space. 

As  we  have  indicated,  much  of  the  geometric  work  on  motion  planning  provided 
a  foundation  for  the  subsequent  and  parallel  work  on  planning  with  uncertainty. 
Investigation  of  the  motion  planning  problem  finds  its  roots  in  the  works  of  Brooks 
[Brooks83];  Lozano-Perez  and  Wesley  [LPWj;  Reif  [Reif];  Schwartz  and  Sharir  [ScShll] 
and  [ScShlll];  and  Udupa  [Udupa],  For  further  foundational  work  in  the  area, 
both  for  a  single  robot  and  for  several  moving  robots,  see  Brooks  and  Lozano-Pe¬ 
rez  [BLP];  Lozano-Perez  [Loz86];  Canny  [Can88];  Canny  and  Donald  [CD];  Donald 
([Don84]  and  [Don87a]);  Erdmann  and  Lozano-Perez  [ELP];  Fortune,  Wilfong,  and 
Yap  [FWY];  Kant  and  Zucker  [KZ];  Khatib  [Khatib] ;  Kodits  hek  [Kodit];  Hopcroft, 
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Joseph,  and  Whitesides  [HJW];  Hopcroft,  Schwartz,  and  Sharir  [HSS] ;  Hopcroft  and 
Wilfong  ([HW84]  and  [HW86]);  O’Dunlaing  and  Yap  [ODY];  O’Dunlaing,  Sharir 
and  Yap  [ODSY];  Reif  and  Sharir  [RS];  Spirakis  and  Yap  [SpY];  and  Yap  ([Yap84] 
and  [Y?.p86]).  Thi°  is  by  no  means  an  exhaustive  list.  Much  research  has  been  done. 
Some  books  with  excellent  survey  articles  include  [SHS],  [SY],  and  [KCL]. 

We  will  not  discuss  this  work  in  detail,  but  instead  focus  more  on  the  development 
of  the  work  on  uncertainty. 

1.4.4  Planning  for  Errors 

The  generalized  spring  and  generalized  damper  approaches  provided  a  new  set  of 
primitives  with  which  one  could  reduce  uncertainty  in  specific  local  settings.  In 
parallel  with  this  work  there  arose  a  desire  to  synthesize  entire  planning  systems 
that  could  account  for  uncertainty.  Early  work  considered  parameterizing  strategies 
in  terms  of  quantities  that  could  vary  with  particular  problem  instantiations.  The 
skeleton  strategies  of  Lozano-Perez  [Loz76]  and  Taylor  [Tay]  offered  a  means  of 
relating  error  estimates  to  strategy  specifications  in  detail.  In  particular.  Lozano- 
-Perez's  Lama  system  used  geometric  simulation  of  plan  steps  to  decide  on  possible 
motion  outcomes.  The  simulation  made  explicit  the  possible  errors  that  could  occur. 
This  information  could  be  used  to  restrict  certain  parameters  or  to  introduce  extra 
sensing  operations.  Taylor’s  system  used  symbolic  reasoning  to  restrict  the  values 
of  parameters  in  skeleton  strategies  in  order  to  ensure  successful  motions.  Brooks 
[Brooks82]  extended  this  approach  using  a  symbolic  algebra  system.  His  system  could 
be  used  both  to  provide  error  estimates  for  given  operations,  as  well  as  to  constrain 
task  variables  or  add  sensing  operations  in  order  to  guarantee  task  success.  Along 
a  slightly  different  line,  Dufay  and  Latombe  [DL]  developed  a  system  that  observed 
execution  traces  of  proposed  plans,  then  modified  these  using  inductive  learning  to 
account  for  uncertainty. 

1.4.5  Planning  Guaranteed  Strategies  using  Preimages 

In  1983,  Lozano-Perez,  Mason,  and  Taylor  [LMT]  proposed  a  planning  framework 
for  synthesizing  fine-motion  strategies.  This  approach  is  sometimes  referred  to  as  the 
preimage  framework.  This  is  because  the  framework  generates  plans  by  recursively 
backchaining  from  the  goal.  Each  backchaining  step  generates  a  collection  of  sets, 
known  as  preimages,  from  which  entry  into  the  goal  is  guaranteed.  This  framework  has 
strong  connections  to  the  dynamic  programming  approach  mentioned  above,  which 
will  be  discussed  further  in  the  thesis.  The  preimage  framework  directly  incorporated 
the  effect  of  uncertainty  into  the  planning  process.  In  particular,  the  framework 
made  clear  how  sensing  operations  as  well  as  mechanical  operations  could  be  used  to 
reduce  uncertainty.  An  example  of  a  mechanical  operation  that  reduces  uncertainty 
is  a  guarded  move.  During  a  guarded  move  a  robot  moves  in  the  direction  of  an 
object  located  at  an  unknown  distance,  until  contact  with  the  object  is  established. 
Thus  the  uncertain  location  of  the  object  becomes  known  with  precision,  relative 
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to  the  location  of  the  robot.  Guarded  moves  are  discussed  in  [WG].  Earlier  work 
using  guarded  moves  includes  [Ernst].  Mason  [Mas84]  showed  that  that  the  preimage 
planning  approach  is  correct  and  bounded-complete.  This  means  that  if  any  system 
can  solve  a  motion  planning  problem  given  the  uncertainty  and  dynamics  assumed 
within  the  preimage  framework,  then  in  fact  the  preimage  framework  will  also  provide 
a  solution. 

The  preimage  methodology  spawned  numerous  other  directions  of  research. 
Erdmann  [Erd84]  considered  the  issues  of  goal  reachability  and  recognizability.  He 
showed  that  for  some  variations  of  the  planning  problem,  the  task  of  computing 
preimages  can  be  separated  into  two  simpler  problems.  One  of  these  ensures  that 
the  system  will  reach  its  goal,  while  the  other  ensures  that  the  system  will  actually 
recognize  that  it  has  attained  the  goal.  In  general  these  issues  are  not  separable. 
Buckley  [Buc]  implemented  a  system  that  computed  multi-step  strategies  in  three- 
dimensional  cartesian  space.  His  planner  employed  a  discrete  state  graph  that 
modelled  the  possible  transitions  and  sensing  operations  in  an  And/OR  graph.  Turk 
[Turk]  implemented  a  two-dimensional  Lackchaining  planner. 

i.4.6  Sensorless  Manipulation 

We  have  already  mentioned  the  importance  of  mechanical  operations  for  reducing 
uncertainty.  A  strong  champion  of  such  techniques  is  Mason.  See,  for  instance, 
[Mas82a],  [Mas85],  and  [Mas86].  In  particular,  Mason  has  looked  at  the  problem  of 
reducing  uncertainty  in  the  orientation  of  parts  by  pushing.  Building  on  this  work. 
Brost  (see  [Brost85]  and  [Brost86])  has  implemented  a  system  that  can  orient  planar 
parts  through  a  series  of  pushing  and  squeezing  operations.  An  important  aspect 
of  these  strategies  is  that  they  do  not  require  sensing  at  the  task  level.  Instead, 
all  the  actions  are  open  loop,  relying  purely  on  the  mechanics  of  the  problem  to 
reduce  uncertainty.  Other  work  involving  the  reduction  of  uncertainty  without  sensing 
includes  the  work  by  Mani  and  Wilson  [MW]  on  orienting  parts  by  sequences  of 
pushing  operations,  the  work  by  Peshkin  [Pesh]  on  orienting  parts  by  placing  a  series 
of  gates  along  a  conveyer  belt,  and  the  graph  algorithms  of  Natarajan  [Nat86]  for 
designing  parts  feeders  and  planning  tray-tilting  operations.  A  tray-tilter  is  a  system 
that  orients  planar  parts  dropped  into  a  tray  by  tilting  the  tray.  Erdmann  and  Mason 
[EM]  investigated  this  problem,  designing  a  planner  based  on  the  mechanics  of  part 
interactions  with  the  walls  of  the  tray.  The  planner  expected  as  input  a  polygonal 
description  of  the  part  to  be  oriented  along  with  the  coefficient  of  friction.  The  output 
of  the  planner  consisted  of  a  sequence  of  tilting  operations  that  was  guaranteed  to 
orient  and  position  the  part  unambiguously,  if  such  a  sequence  exis  ed.  A  robot 
executed  the  motions  suggested  by  the  planner.  This  work  represents  a  specialization 
of  the  preimage  framework  to  the  sensorless  case,  in  which  only  mechanical  operations 
may  be  used  to  reduce  uncertainty.  The  idea  for  tray-tilting  came  from  work  by 
Grossman  and  Blasgen  [GB]  who  used  a  combination  of  tray-tilting  and  probing 
operations  to  ascertain  the  orientation  of  a  part  as  a  prelude  to  grasping  the  part. 
Taylor,  Mason,  and  Goldberg  [TMG]  introduced  sensing  back  into  the  tray-tilter,  as 
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a  means  of  investigating  the  relative  power  of  sensing  and  mechanical  operations 
They  developed  a  discrete  planning  system  based  on  an  And/Or  graph  similar  to 
the  graph  used  in  Buckley  s  planner. 

More  recent  work  includes  the  study  of  impact  by  Wang  (see  [Wang]  and  [WM]). 
Studying  impact  is  of  central  importance,  since  all  operations  in  which  objects  make 
contact  involve  impact.  Generally,  the  impact  occurs  at  scales  well  below  those 
available  to  current  sensors. 

1.4.7  Complexity  Results 

We  should  mention  some  hardness  results  regarding  the  motion  planning  problem 
in  the  presence  of  uncertainty.  Natarajan  [Nat88]  has  shown  the  problem  to  be 
PSPACE-hard  in  three  dimensions  for  polyhedral  objects.  Canny  and  Reif  [CR] 
have  shown  the  problem  to  be  hard  for  non-deterministic  exponential  time,  also  in 
three  dimensions.  In  general,  the  computability  and  complexity  of  the  problem  of 
planning  in  the  presence  of  uncertainty  is  not  known.  Erdmann  [Erd84]  showed  that 
the  problem  is  uncoinpu table  ;n  the  plane  if  the  environment  can  encode  arbitrary 
recursive  functions.  However,  tor  many  special  cases,  computable  algorithms  are 
known.  Natarajan  [Nat86]  also  has  a  number  of  results  suggesting  fast  planning  times 
for  restricted  versions  of  the  sensorless  manipulation  problem.  Donald  [DonS9j  has 
demonstrated  various  polynomial-time  algorithms  for  computing  single-step  stra*"gies 
in  the  plane,  assuming  restrictions  on  the  type  of  sensing  permitted.  In  particular,  all 
motions  were  terminated  by  detecting  sticking  in  the  environment.  Donald  also  gave 
a  single-exponential- time  algorithm  based  on  the  theory  of  real  closed  fields  for  the 
multi-step  strategy.  Briggs  [Briggs]  extended  these  results  to  improve  the  performance 
of  the  single-step  planner.  Also.  Canny  [Can89]  recently  exhibited  an  algorithm  based 
on  the  theory  of  real  closed  fields  that  solves  the  general  motion  planning  problem 
under  uncertainty  for  those  cases  in  which  the  robot  trajectories  may  be  modelled  as 
algebraic  curves. 

1.4.8  Further  Work  on  Preimages 

Further  work  on  the  preimage  methodology  has  been  conducted  by  Latombe  [Lat] 
and  his  group.  This  work  includes  a  study  of  the  preimages  and  strategies  that  result 
from  the  use  of  various  termination  predicates,  in  addition  to  those  used  in  the  LMT 
preimage  methodology.  Others  who  have  looked  at  fine  motion  assembly  recently 
include  [Desai],  [Koutsou],  [LauTh],  and  [Valade].  We  also  refer  to  the  book  [KCL] 
for  a  review  of  other  relevant  literature. 

1.4.9  Guaranteed  Plans 

The  philosophy  of  the  preimage  methodology  is  to  generate  plans  that  are  guaranteed 
to  accomplish  some  task  despite  uncertainty  in  control,  sensing,  and  possibly  the 
geometry  of  the  environment.  The  framework  assumes  that  uncertainty  can  behave 
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as  a  worst-case  adversary,  within  specified  task-dependent  bounds.  If  a  given  subgoal 
cannot  be  attained  with  certainty  assuming  this  worst-case  behavior,  then  the  task 
is  deemed  unsolvable.  In  this  thesis  a  guaranteed  strategy  for  solving  a  task  will 
therefore  refer  to  a  sei  of  possibly  conditional  actions  that  are  certain,  in  the  presence 
of  this  worst-case  uncertainty,  to  accomplish  the  task  in  a  bounded  predetermined 
amount  of  time. 

1.4.10  Error  Detection  and  Recovery 

An  important  offspring  of  the  LMT  preimage  planning  methodology  is  Donald’s 
recent  thesis  (see  [Don87b]  and  [Don89j).  This  work  deals  with  the  problem  of 
representing  model  error  and  the  problem  of  Error  Detection  and  Recovery.  The 
need  for  error  detection  and  recovery  arises  naturally  if  one  permits  uncertainty  in 
the  geometric  shape  of  objects.  This  is  because  for  many  interesting  tasks  there 
simply  are  no  guaranteed  plans  in  the  sense  just  outlined.  An  example  that  Donald 
cites  is  the  task  of  inserting  a  peg  into  a  hole  in  which  the  size  of  the  hole  can  vary 
due  to  manufacturing  errors.  Certainly,  if  the  hole  is  smaller  than  the  peg,  then  the 
peg  cannot  be  inserted.  Nonetheless,  in  many  cases  the  hole  will  be  large  enough, 
and  it  would  be  foolish  not  to  try  to  insert  the  peg.  Donald  claims  that  a  robot 
should  attempt  certain  tasks  even  if  there  is  no  guarantee  of  success,  so  long  as  there 
is  a  guarantee  that  the  robot  will  be  able  to  ascertain  whether  or  not  its  attempt 
has  succeeded.  An  error  in  Donald’s  terminology  is  thus  more  subtle  than  the  usual 
notion  that  an  error  occurs  when  an  action  does  not  have  the  desired  outcome.  An 
error  for  which  one  can  plan  a  recovery  prior  to  execution  time  is  not  really  an  error, 
merely  one  of  many  execution-time  conditions  for  which  the  system  needs  to  check 
before  deciding  on  its  next  action.  In  Donald’s  framework,  an  error  is  a  condition  of 
task  failure  for  which  it  is  impossible  to  plan  a  recovery  at  planning  time.  Thus  the 
claim  is  that  a  robot  should  attempt  tasks  even  if  an  error  is  possible,  so  long  as  the 
error  is  recognizable.  Donald’s  formulation  makes  use  of  the  preimage  methodology  in 
defining  how  a  strategy  operates.  In  particular,  his  definition  of  failure  and  the  error 
recognizability  condition  are  based  on  the  preimage  constructs  of  reachability  and 
recognizability.  These  are  determined  by  the  dynamics  of  the  task  and  the  available 
sensors  and  termination  predicates. 

The  important  contribution  of  Donald’s  work  is  that  it  moved  away  from  the 
requirement  that  a  strategy  be  guaranteed  to  solve  a  task  in  order  to  be  considered 
a  strategy.  This  is  an  important  and  subtle  point,  that  forms  the  motivation  for 
the  current  thesis.  By  permitting  strategies  to  fail,  one  can  vastly  increase  the 
class  of  tasks  that  one  would  consider  solvable.  Indeed,  it  is  clear  that  in  some 
completely  imperfect  world,  no  task  is  ever  guaranteed  to  be  solvable  assuming  worst- 
case  adversaries.  The  real  world  is  such  a  world.  Yet  many  tasks  are  solvable  simply 
because  they  are  attainable  sometimes.  Donald’s  thesis  made  this  notion  very  precise. 
The  aim  of  the  current  thesis  is  to  extend  some  of  these  ideas,  by  considering  tasks 
that  are  solvable  in  an  expected  sense.  Of  great  importance  is  the  ability  to  loop 
and  try  again,  as  suggested  in  Donald’s  thesis.  In  a  worst-case  sense,  looping  does 
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not  help,  since  the  strategy  can  always  fail.  However,  by  introducing  the  notions 
of  probabilistic  failure,  either  through  actions  that  have  probabilistic  outcomes  or 
through  active  randomization  of  run-time  decisions,  one  can  often  guarantee  task 
solvability  in  an  expected  sense. 


1.4.11  Randomization 

In  a  slightly  different  direction,  we  should  mention  that  randomization  is  a  technique 
that  is  sometimes  used  in  optimizing  algorithms.  The  simulated  annealing  approach 
[KGV]  is  a  well-known  technique.  Roughly  speaking  the  randomization  of  simulated 
annealing  helps  to  avoid  local  minima.  For  any  given  level  of  randomization  the 
system  naturally  converges  to  some  subset  of  the  state  space.  By  reducing  the  level 
of  randomization  in  a  principled  manner,  this  subset  is  made  to  converge  to  the 
desired  optimal  states.  In  the  context  of  this  thesis,  randomization  is  used  to  avoid 
deterministic  traps.  This  is  similar  to  the  avoidance  of  local  minima.  However,  there 
is  no  notion  of  changing  the  level  of  randomization  in  order  to  ensure  convergence. 
Indeed,  for  the  most  part  we  will  assume  that  a  desired  goal  is  recognizable  upon  entry. 
More  general  strategies  might  relax  this  assumption,  relying  instead  on  a  probabilistic 
prediction  function  to  ensure  that  the  goal  is  attained  with  high  reliability. 

Randomization  has  also  been  used  in  the  domain  of  mobile  robots.  See  for  instance 
[Arkin],  who  injects  noise  into  potential  fields  in  order  to  avoid  plateaus  and  ridges. 
[BL]  have  also  investigated  a  Monte-Carlo  approach  for  escaping  from  local  minima 
in  potential  fields. 

Some  probabilistic  work  has  aimed  at  facilitating  the  design  process.  For  instance, 
[BRPM]  have  considered  the  problem  of  determining  the  natural  resting  distributions 
of  parts  in  a  vibratory  bowl  feeder.  This  information  is  useful  for  designing  both  part 
shapes  and  bowl  feeders. 

(Goldberg]  is  currently  investigating  probabilistic  strategies  for  grasping  objects. 
That  work,  in  parallel  with  the  work  of  this  thesis,  is  also  interested  in  the  development 
of  a  general  approach  towards  the  analysis  and  synthesis  of  randomized  strategies  for 
manipulation  tasks. 


1.5  Thesis  Contributions 

The  contributions  of  this  thesis  lie  both  in  adding  randomization  to  the  theory 
of  manipulation  and  in  the  practical  demonstration  of  an  assembly  task  using 
randomization.  The  major  contributions  of  the  thesis  are: 

•  Implementation  of  a  Randomized  Peg-In-Hole  Task  on  a  PUMA. 

This  experiment  demonstrated  the  feasibility  and  usefulness  of  randomization 
in  assembly  operations.  The  sensors  available  to  the  system  consisted  of  joint 
encoders  on  the  robot  and  a  camera  positioned  above  the  assembly.  The  camera 
was  used  to  obtain  an  approximate  position  of  the  edges  of  the  peg  and  the  hole. 
These  were  used  to  suggest  a  nominal  motion.  If  no  edges  could  be  obtained  then 
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the  robot  would  execute  a  randomizing  motion.  The  system  was  intentionally 
not  calibrated  very  well,  in  order  to  test  the  ability  of  the  randomizing  actions 
to  overcome  incomplete  information. 

•  Introduction  of  a  Formal  Approach  for  Synthesizing  Randomized 
Strategies.  There  exist  established  formalisms  for  generating  guaranteed 
or  optimal  strategies  in  the  presence  of  uncertainty.  The  LMT  preimage 
methodology  and  dynamic  programming  are  two  such  formalisms.  This  thesis 
builds  on  these  approaches  to  include  randomizing  actions.  Randomization  is 
seen  as  another  operator,  called  SELECT,  that  randomly  chooses  between  a 
collection  of  partial  strategies,  under  the  assumption  that  the  preconditions  of 
at  least  one  such  partial  strategy  are  satisfied.  Partial  strategies  are  generated 
by  backchaining  from  the  goal.  The  thesis  elucidates  the  conditions  under  w'hich 
this  approach  is  expected  to  complete  a  task. 

•  Analysis  of  a  Randomized  Strategy  with  a  Biased  Sensor.  The  thesis 
presents  a  detailed  example  in  which  sensing  error  consists  of  a  pure  bias. 
The  bias  is  unknown  but  of  bounded  magnitude.  It  is  shown  that  a  strategy 
which  interprets  the  sensor  as  correct  can  fail  to  attain  the  goal.  In  contrast,  a 
randomized  strategy  can  avoid  inaccurate  information  produced  by  the  sensor 
while  ensuring  eventual  goal  attainment.  Furthermore,  the  randomized  strategy 
can  rapidly  attain  the  goal  from  certain  start  regions. 

•  Nominal  Plans.  The  thesis  introduces  the  notion  of  a  collection  of  nominal 
plans  as  the  choice  set  for  a  randomized  strategy.  This  approach  is  a  special  case 
of  the  general  planning  methodology  for  synthesizing  randomized  strategies. 
The  nominal  plans  play  the  role  of  the  partial  strategies  in  that  methodology. 
The  difference  is  that  the  nominal  plans  are  themselves  generated  as  guaranteed 
plans  assuming  favorable  instantiations  of  error  parameters.  In  particular,  in 
this  thesis,  the  nominal  plans  are  strategies  that  are  guaranteed  to  succeed  in 
the  absence  of  uncertainty.  A  randomized  strategy  tries  to  follow  these  nominal 
plans  as  well  as  possible  despite  uncertainty. 

•  Progress  Measures.  Nominal  plans  sometimes  define  a  progress  measure  on 
state  space.  This  is  because  nominal  plans  specify  the  ideal  behavior  of  a  system 
in  solving  a  task.  Formally,  a  progress  measure  is  a  real-valued  function  on  a 
system's  state  space  that  is  zero  at  the  goal  and  positive  elsewhere.  Distance 
from  the  goal  is  a  possible  progress  measure.  If  a  strategy  can  guarantee  that 
it  makes  sufficient  progress  on  average  at  each  point  of  the  state  space,  then 
expected  goal  convergence  is  certain  to  be  rapid. 

•  Simple  Feedback  Loops.  Strategies  that  only  consider  current  sensed 
information  in  making  decisions  are  simple  feedback  loops.  The  thesis  introduces 
randomized  simple  feedback  loops.  These  try  to  make  progress  relative  to  a 
progress  measure  whenever  possible  and  otherwise  execute  a  random  motion. 
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•  Random  Walks.  The  thesis  studies  random  walks,  as  these  define  the  most 
basic  type  of  randomized  strategy.  A  random  walk  forms  a  good  model  for  the 
behavior  of  a  randomized  simple  feedback  loop  in  the  presence  of  probabilistic 
errors.  The  thesis  introduces  the  notion  of  an  expected  velocity  as  the  expected 
change  in  the  progress  labelling  of  a  random  walk.  The  thesis  proves  that 
this  expected  velocity  possesses  properties  similar  to  those  of  a  deterministic 
velocity.  In  particular,  if  a  strategy  everywhere  makes  expected  progress  towards 
a  goal,  and  if  the  progress  measure  consists  of  small  numbers,  then  expected 
convergence  to  the  goal  must  be  rapid. 


•  Analysis  of  a  Randomized  Simple  Feedback  Loop  in  the  Presence  of 
Unbiased  Gaussian  Noise.  The  thesis  considers  a  simple  feedback  loop  for 
attaining  a  two-dimensional  region  in  the  plane.  This  is  an  abstraction  of  the 
peg-in-hole  problem.  The  system  has  available  to  it  a  position  sensor  and  a 
goal  recognizer.  The  strategy  is  formulated  assuming  only  that  specific  bounds 
may  be  placed  on  the  error  distributions  that  describe  the  sensing  and  control 
errors.  Thus  the  strategy  is  known  to  converge  eventually  for  all  errors  satisfying 
these  bounds.  For  an  analysis  of  the  strategy,  the  sensing  and  control  errors 
are  each  assumed  to  be  unbiased  two-dimensional  normal  variates.  The  thesis 
shows  numerically  for  a  particular  example  that  the  convergence  properties  of 
this  randomized  strategy  are  substantially  better  than  those  for  a  corresponding 
guaranteed  strategy.  In  particular,  the  region  of  fast  convergence  is  considerably 
greater. 


•  Finite  Guesses.  On  discrete  spaces  the  operator  SELECT  naturally  only  needs 
to  guess  between  a  finite  number  of  possible  strategies.  Thus  the  probability  of 
guessing  an  appropriate  strategy  is  non-zero.  In  the  continuous  domain,  it  may 
be  necessary  to  guess  over  an  infinite  number  of  strategies.  However,  the  thesis 
shows  that  under  suitable  conditions  only  guesses  over  finite  sets  are  required. 
The  conditions  amount  to  the  requirement  that  whenever  the  system  executes 
a  motion  for  some  time,  the  predicted  possible  locations  of  the  system  at  that 
time  form  an  open  set. 


•  Near-Sensorless  Tasks  are  defined  as  tasks  in  which  there  is  no  sensing 
except  to  signal  goal  attainment.  Blindly  inserting  a  key  into  a  hole  is  one 
such  task.  The  thesis  shows  that  in  this  context  there  are  tasks  for  which 
there  exist  guaranteed  solutions  that  require  an  exponential  amount  of  time  to 
execute  in  the  worst  case,  while  there  exist  randomized  solutions  that  require  a 
polynomial  amount  of  time  to  attain  the  goal,  in  an  expected  sense.  This  result 
demonstrates  that  randomization  need  not  necessarily  increase  convergence 
times. 
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1.6  Thesis  Outline 

Chapter  2  presents  a  more  detailed  outline  of  the  thesis.  This  chapter  also  contains 
further  motivational  material.  The  chapter  is  intended  both  as  a  second  introductory 
chapter  and  as  a  precis  of  the  thesis. 

Chapter  3  develops  the  basic  approach.  This  is  done  in  the  discrete  setting,  for 
simplicity.  Fortunately,  many  of  the  results  carry  over  to  the  continuous  domain.  The 
basic  idea  is  to  use  the  traditional  methodology  for  computing  guaranteed  plans  as  a 
means  of  suggesting  partial  or  nominal  plans.  Sensing  uncertainty  may  prevent  the 
system  from  satisfying  the  preconditions  of  any  particular  nominal  plan.  However, 
in  some  cases  the  system  can  readily  satisfy  the  union  of  all  the  plans’  preconditions. 
Then  it  makes  sense  for  the  system  to  randomly  and  repeatedly  choose  and  execute 
a  nominal  plan.  The  hope  is  that  the  system  will  eventually  cl  )se  a  plan  whose 
preconditions  are  satisfied,  and  which  therefore  will  successfully  accomplish  the  task. 
Chapter  3  considers  the  conditions  under  which  this  type  of  strategy  may  be  applied. 
Of  particular  interest  are  tasks  in  which  there  is  a  progress  measure.  If  the  system 
can  locally  make  progress  on  the  average,  then  the  overall  expected  convergence  time 
may  be  bounded  readily. 

Chapter  4  extends  this  approach  to  the  continuous  domain.  Some  subtleties  enter 
into  the  picture.  In  particular,  in  order  to  be  certain  of  eventual  convergence,  a 
randomized  strategy  should  only  make  guesses  that  have  a  non-zero  probability  of 
success.  Given  an  infinite  number  of  nominal  plans,  as  is  possible  in  the  continuous 
domain,  the  probability  of  guessing  correctly  may  actually  be  zero.  Chapter  4 
examines  this  problem  and  shows  that  often  it  is  reasonable  to  consider  only  a  finite 
number  of  nominal  plans. 

Chapter  5  analyzes  the  task  of  moving  a  point  on  the  plane  into  a  circle,  in  the 
presence  of  sensing  and  control  uncertainty.  This  is  a  natural  generalization  of  the 
peg-in-hole  problem  considered  at  the  beginning  of  the  thesis.  The  analysis  involves 
an  approximation  by  a  diffusion  process  that  establishes  fast  convergence  times  for  a 
range  of  goal  sizes. 
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Chapter  2 

Thesis  Overview  and  Technical 
Tools 


The  purpose  of  this  chapter  is  to  provide  a  basic  overview  of  the  thesis.  The  chapter 
is  intended  both  as  a  self-contained  summary  of  the  thesis  as  well  as  a  guideline  for 
the  results  presented  in  the  remaining  chapters.  We  will  motivate  the  basic  problem, 
present  the  technical  tools  and  definitions,  and  mention  the  main  results  of  the  thesis. 
All  this  will  be  done  at  a  fairly  high  level,  with  the  details  of  the  definitions  and  proofs 
left  for  future  chapters.  It  is  hoped  that  the  early  presentation  of  the  main  issues  will 
provide  a  cohesive  guideline  for  t!  i  more  technical  points  of  the  later  chapters. 

The  first  major  section  of  this  chapter  provides  a  high-level  perspective  and 
motivation.  The  second  section  is  concerned  with  basic  definitions.  Towards  the 
end  of  the  section  we  introduce  the  notion  of  randomized  strategies.  The  third  major 
section  of  the  chapter  presents  a  detailed  example  that  is  intended  to  highlight  the 
importance  of  randomized  strategies.  Finally,  the  last  section  discusses  in  some  more 
detail  the  particular  focus  on  randomized  strategies  taken  by  this  thesis. 


2.1  Motivation 

In  general  one  should  think  of  randomization  as  a  primitive  strategy,  and  thus  as  a  tool 
at  the  lowest  level.  One  should  not  forget  all  the  work  on  the  synthesis  of  strategies  for 
solving  tasks  in  the  presence  of  uncertainty.  Instead,  randomization  should  be  viewed 
as  an  operation  that  is  superimposed  on  top  of  the  work  for  generating  guaranteed 
strategies.  Indeed,  randomization  is  even  physically  superimposed  on  top  of  these 
strategies.  It  is  the  combination  of  sensing,  mechanics,  and  randomization 
that  achieves  a  task,  not  any  one  of  these  alone.  We  will  study,  primarily 
in  chapter  3,  strategies  that  judiciously  make  use  of  sensing,  predictive  ability,  and 
randomization.  The  physically  realizable  solutions  to  tasks  are  those  for  which  on 
the  average  progress  is  being  made  towards  the  goal.  The  randomization  ensures  that 
partially  modelled  system  parameters  may  be  ignored,  while  the  sensing  and  task 
mechanics  ensure  that  progress  is  made  t  wards  the  goal  whenever  the  randomization 
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has  placed  the  system  in  a  fortuitous  position. 

2.1.1  Domains  of  Applicability 

The  broadly  intended  domains  of  applicability  for  the  material  presented  in  this  thesis 
are: 


•  Parts  Assembly  and  Manipulation. 

-  In  the  presence  of  sensing  and  control  uncertainty. 

—  In  environments  with  sparse  or  incomplete  models. 

-  During  the  fine-motion  phase  of  tight  assemblies. 

-  For  parts  orientation  and  localization. 

(And  combinations  of  these  scenarios.) 

•  Mobile  Robot  Navigation. 

-  With  noisy  sensors. 

-  In  uncertain  environments. 

•  Facilitate  Design. 

-  Of  special  purpose  sensors  useful  for  solving  particular  tasks. 

-  Of  parts  shaped  to  permit  easy  mechanical  assembly. 

The  main  focus  of  the  thesis  is  within  Do  first  domain  on  this  list.  This  domain 
consists  of  tasks  involving  the  assembly  ano.  :anipulation  of  parts.  Examples  include 
the  mating  of  two  or  more  parts,  the  grasping  of  a  part,  and  the  orienting  and 
localization  of  one  or  more  parts  whose  initial  configurations  are  unknown.  By 
localization  we  mean  the  constraining  of  a  part’s  configuration  in  a  purposeful  manner, 
possibly  as  a  prelude  to  some  other  operations.  The  archetypical  example  of  a  parts 
mating  operation  is  given  by  the  task  of  inserting  a  peg  into  a  hole.  This  is  a  classic 
example,  yet  its  generality  remains.  This  generality  stems  from  the  observation  that 
almost  any  assembly  involving  rigid  or  nearly- rigid  bodies  may  be  viewed  locally  as 
a  peg-in-hole  assembly.  The  tasks  of  grasping  and  orienting  parts  are  themselves 
fundamental  to  manipulation.  In  order  to  assemble  two  parts,  these  must  be  located 
and  manipulated.  The  manipulation  may  involve  grasping  or  it  may  involve  impact 
operations,  such  as  pushing  or  hitting.  In  some  broad  sense  grasping  subsumes  these 
latter  operations,  as  they  occur  naturally  at  some  scale  during  any  operation  involving 
the  contact  of  two  or  more  objects.  Finally,  parts  ultimately  must  be  oriented  and 
localized  in  order  to  be  assembled.  A  system  need  not  necessarily  be  cognizant  of  the 
localization  operation,  yet  localization  mrst  occur  at  either  the  mechanical  or  sensing 
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levels.  More  generally,  a  task  that  involves  the  transfer  of  objects  from  a  state  of 
high  entropy  to  an  assembled  state,  such  as  the  task  of  picking  a  part  out  of  a  bin 
containing  several  different  randomly  oriented  parts,  determining  the  part’s  pose,  and 
then  placing  it  in  some  constrained  locale,  requires  variations  of  all  of  these  operations. 
In  particular,  almost  by  definition,  such  a  task  requires  considerable  localization. 

Most  of  the  results  of  the  thesis  will  be  developed  with  the  inspiration  of  these 
examples  in  mind.  However,  the  results  are  sufficiently  general  that  they  may  be 
applied  to  domains  other  than  pure  manipulation.  Some  of  these  are  indicated  in  the 
list  above. 

2.1.2  Purpose  of  Randomization 

One  of  the  key  motivations  for  considering  randomized  strategies  is  given  by 
our  description  of  manipulation  tasks  in  the  presence  of  uncertainty  as  methods 
for  reducing  entropy.  Specifically,  parts  are  moved  from  a  disorganized  state 
into  an  assembled  state,  from  an  unknown  orientation  to  a  known  orientation, 
from  an  unconstrained  location  to  a  grasped  location,  and  so  forth.  Reducing 
entropy  is  generally  difficult,  requiring  considerable  information  about  the  world. 
Randomization  permits  the  view  of  an  organized  state  as  simply  one  of  many  random 
states.  By  actively  randomizing,  a  system  can  under  suitable  conditions  ensure  that 
it  will  eventually  pass  through  this  desired  state.  (The  suitable  conditions  effectively 
postulate  lower  bounds  on  the  probability  of  success.) 

Standard  approaches  for  solving  tasks  that  involve  the  reduction  of  uncertainty 
include: 

•  Perfection. 

-  Model  the  world  perfectly. 

-  Reduce  sensing  errors  to  zero. 

-  Reduce  control  errors  to  zero. 

•  Plan  for  Uncertainty. 

-  Use  sensing  when  possible  to  gain  information  from  the  environment. 

*  For  instance,  use  a  combination  of  position  and  force  sensors  in  order  to 
gain  more  information  than  either  sensor  could  provide  in  isolation.  As 
an  example,  a  force  sensor  might  register  contact  with  a  table,  while  a 
position  sensor  could  localize  that  contact  to  within  some  small  range. 

*  Build  special  sensors  to  detect  particular  system  states.  This  includes 
light  beams  at  finger  tips,  touch  sensors,  special  calibration  devices, 
lasers,  structured  light,  and  so  forth. 

-  Use  the  mechanics  of  the  domain  to  reduce  uncertainty. 

*  For  instance,  bump  into  an  object  in  order  to  reduce  the  uncertainty 
of  the  relative  position  of  that  object. 
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*  Drop  a  polyhedral  part  onto  a  table  in  order  to  reduce  its  orientations 
to  a  manageable  number. 

*  Design  parts  and  feeding/assembly  devices  concurrently,  with  the  aim 
of  simplifying  the  grasping  or  localization  process. 

-  Strategically  combine  sensing  and  action. 

*  For  instance,  in  order  to  move  one  part  within  a  certain  distance  of 
another  object  whose  location  is  unknown,  it  makes  sense  to  first  bump 
the  part  into  that  object,  then  back  away  by  the  desired  distance,  if 
possible. 

•  Tolerate  Failure. 

-  Give  up  the  insistence  on  a  guaranteed  strategy  as  the  only  means  of 
solving  a  task. 

Accepting  Uncertainty 

The  assumption  that  the  world  i z  perfect  is  muen  too  strong  an  assumption  to  be 
realistic.  Instead,  as  we  outlined  in  section  1.4,  much  effort  has  been  devoted  over 
the  last  few  decades  to  accounting  for  uncertainty  explicitly.  The  aim  has  been  to 
reduce  uncertainty  or  entropy  by  judicious  use  of  sensing  and  action.  The  difficulty 
with  such  approaches  is  that  they  tend  to  make  strong  assumptions  about  the  world. 
For  instance,  generally  those  frameworks  that  produce  guaranteed  plans  have  trouble 
dealing  with  tiny  variations  in  geometry.  A  strategy  that  slides  one  object  on  top 
of  another  may  fail  if  the  component  surfaces  contain  small  nicks  and  protrusions. 
Similarly,  if  a  sensing  error  is  larger  than  expected,  or  if  a  sensor  contains  an  unknown 
bias,  a  strategy  that  relies  crucially  on  the  validity  of  its  assumptions  will  fail.  This 
defeats  the  philosophy  motivating  the  construction  of  planners  that  explicitly  account 
for  uncertainty.  That  philosophy  states  that  one  should  from  the  outset  be  aware  of 
uncertainty,  rather  than  ignore  it  in  the  hope  that  the  plans  developed  for  a  perfect 
world  will  be  good  enough  in  the  face  of  uncertainty.  The  philosophy  is  defeated 
because  the  strategies  developed  in  the  quest  of  guaranteed  plans  are  only  as  good  as 
the  assumptions  preceding  them.  Of  course,  everyone  is  aware  of  this  dependence,  yet 
it  lingers.  More  importantly,  the  dependence  can  lead  to  the  desire  to  model  the  world 
accurately,  to  improve  one’s  sensors,  and  to  improve  one’s  control  systems,  solely  for 
the  sake  of  solving  a  particular  task  more  easily.  These  are  highly  worthwhile  goals, 
but  they  run  the  risk  of  ignoring  a  set  of  crucial  intellectual  questions: 

•  What  is  the  information  needed  to  solve  a  task? 

•  What  tasks  can  be  solved  by  a  given  repertoire  of  operations? 

•  How  sensitive  are  solutions  of  tasks  to  particular  assumptions  about  the  world? 
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Indeed,  the  design  of  better  systems  for  dealing  with  uncertainty  should  be 
interwoven  with  the  investigation  of  these  questions.  The  answers  to  these  questions 
will  themselves  facilitate  the  design  of  better  systems  for  dealing  with  uncertainty 
and  will  improve  planning  technologies. 

A  key  approach  listed  above  is  that  of  tolerating  failure.  This  is  a  fairly  recent 
idea  within  the  formal  planning  methods  of  robotics  (see  [Don87b]).  It  is  important 
because  it  reminds  us  of  the  right  psychological  framework.  No  task  possesses  an 
absolutely  guaranteed  solution.  Instead  of  searching  for  guaranteed  solutions,  one 
should  try  to  answer  the  three  questions  above,  for  any  task  of  interest.  There  is 
a  spectrum  of  assumptions,  a  spectrum  of  strategies,  and  a  corresponding  spectrum 
of  outcomes  for  any  given  assumptions  and  strategy.  Failure  is  always  one  of  the 
possible  outcomes  in  this  spectrum.  The  question  is,  under  what  assumptions? 

Clearly,  the  work  on  uncertainty  over  the  past  several  decades  has  been  trying  to 
answer  the  three  questions.  They  remain  unanswered  in  generality.  This  thesis  is  one 
further  attempt  to  look  at  a  particular  aspect  of  the  answer  to  these  questions. 

Randomization  is  Everywhere 

Randomization  enters  into  the  investigation  of  these  questions  at  the  simplest  level. 
In  some  sense  randomization  is  omnipresent.  For  instance,  uncertainty  that  is  due  to 
noise,  either  in  sensing  or  control,  may  be  thought  of  as  randomization  on  the  part  of 
nature.  The  basic  issue  that  this  thesis  begins  to  address  is  how  active  randomization 
on  a  robot’s  part  can  aid  in  the  solution  of  tasks. 

Some  advantages  of  randomization  are: 

•  Increase  the  class  of  solvable  tasks. 

•  Reduce  the  dependence  of  task  solutions  on  assumptions  about  the  world. 

•  Simplify  the  planning  process. 

We  will  discuss  these  properties  more  throughout  the  thesis.  In  brief,  the  class  of 
solvable  tasks  is  increased  because  the  class  of  strategies  is  enlarged  beyond  the  class 
of  guaranteed  strategies.  Recall  that  a  guaranteed  strategy  is  certain  to  accomplish 
a  task  in  a  bounded  predetermined  number  of  steps.  Randomization  increases  the 
class  of  solvable  tasks  because  the  class  of  randomized  strategies  includes  strategies 
whose  success  is  not  guaranteed  on  any  particular  step,  but  merely  in  an  expected 
sense.  Randomization  decreases  dependence  on  assumptions  when  it  ensures  that  a 
system  will  eventually  behave  in  a  manner  compatible  with  unknown  or  unmodelled 
parameters.  There  are  limits,  of  course,  such  as  trap  states  or  degenerate  goals  that 
must  be  avoided  by  any  strategy.  Finally,  planning  is  simplified  whenever  a  planner 
may  substitute  a  simple  randomized  strategy  in  place  of  a  possibly  complicated 
guaranteed  strategy.  For  instance,  a  random  walk  is  a  simpler  strategy  than  a  spiral 
search.  It  requires  less  history,  although  it  may  require  more  time  to  converge  to  a 
desired  goal  region. 
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Eventual  Convergence 

In  a  sense  we  may  think  of  randomization  as  a  means  of  traversing  the  state  space 
in  a  blind  manner.  Thus  randomization  forms  the  most  primitive  of  strategies  for 
solving  a  task.  By  performing  a  random  walk  in  state  space,  the  system  will,  under 
suitable  conditions,  eventually  pass  through  the  goal.  The  suitable  conditions  amount 
to  guaranteeing  a  minimum  probability  of  success. 

Of  course,  there  are  some  disadvantages  to  randomization.  If  manipulation  tasks 
may  indeed  be  thought  of  as  means  of  reducing  entropy,  then  randomization  seems 
inappropriate.  Indeed,  one  would  expect  randomization  to  increase  entropy.  However, 
this  is  not  always  the  case.  Furthermore,  it  says  merely  that  one  might  have  to  wait  a 
long  time  before  the  system  attains  a  goal.  Other  difficulties  arise  in  ensuring  that  the 
randomization  actually  covers  the  space  of  interest,  that  is,  that  the  goal  is  reachable. 
A  third  difficulty  arises  in  terminating  a  strategy.  Somehow  there  must  be  appropriate 
information  that  enables  a  system  to  recognize  or  predict  goal  attainment.  All  these 
issues  will  be  dealt  with  in  the  thesis. 

Fast  Convergence 

Of  particular  interest  is  the  question  of  convergence  times.  It  clearly  would  be 
inappropriate  to  try  to  insert  a  peg  with  six  degrees  of  freedom  into  a  hole  using 
purely  random  motion^.  Th^  h'de  forms  a  relatively  small  region  within  the  six¬ 
dimensional  configuration  space  of  the  peg.  Finding  this  region  without  any  sensing 
from  far  away  would  require  an  unreasonable  amount  of  time.  However,  if  one  can 
bring  the  peg  close  to  the  hole  using  available  sensors,  then  one  can  reduce  the  space 
that  must  be  searched.  If  one  can  also  remove  some  of  the  peg’s  degrees  of  freedom 
by  making  contact  with  portions  of  the  hole,  then  one  can  further  reduce  the  space 
that  needs  to  be  searched,  by  reducing  its  dimensionality. 


2.2  Basic  Definitions 

This  section  defines  the  basic  tools  used  by  the  thesis.  This  includes  the  spaces 
of  interest,  the  representation  of  uncertainty,  and  the  types  of  strategies  explored 
throughout  the  thesis. 

2.2.1  Tasks  and  State  Spaces 

A  task  is  modelled  as  a  problem  on  some  state  space.  The  state  space  may  be  discrete 
or  continuous.  The  state  space  should  consist  of  all  the  parameters  of  a  system  that 
are  required  to  predict  its  future  behavior.  In  other  words,  knowing  the  current  state 
of  the  system  and  some  action  applied  to  the  system,  it  should  be  possible  to  predict 
the  resulting  state  or  states  of  the  system  without  reference  to  past  states. 

A  task  is  specified  as  the  attainment  of  some  goal  region  in  state  space.  Sometimes 
a  starting  region  may  be  specified  as  well. 
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Figure  2.1:  This  figure  indicates  three  stable  configurations  of  a  planar  Allen  wrench 
lying  on  a  horizontal  table.  These  configurations  may  be  used  to  define  a  discrete 
state  space. 


We  should  mention  briefly  that  the  configuration  space  [Loz83]  of  a  system  is  the 
space  describing  the  degrees  of  freedom  of  the  system.  For  instance,  the  configuration 
space  of  a  rigid  object  in  three  dimensions  is  a  six-dimensional  space  corresponding 
to  three  translational  and  three  rotational  degrees  of  freedom. 

The  relationship  between  the  state  space  and  the  configuration  space  of  a  system 
depends  on  the  dynamics  of  the  system.  For  simplicity,  we  often  assume  that  the 
dynamics  are  first-order  and  that  the  future  state  of  the  system  can  be  predicted 
from  its  current  configuration  and  an  applied  velocity.  In  that  sense  the  state  space 
and  the  configuration  space  are  identical.  We  will  thus  often  not  distinguish  between 
the  two  representations,  although  it  should  be  understood  that  this  is  not  sufficient 
if  the  dynamics  are  of  a  higher  order. 

Continuous  Space 

An  example  of  a  task  specified  in  a  continuous  space  is  given  by  the  peg-in-hole 
problem  of  section  1.1.  The  relevant  state  space  for  that  problem  is  a  three-degree-of- 
freedom  space,  consisting  of  two  translational  and  one  rotational  degrees  of  freedom. 
Actions  are  specified  as  changes  in  position  and  orientation.  The  goal  is  the  range  of 
positions  and  orientations  for  which  the  peg  is  directly  over  the  hole.  This  is  a  fairly 
small  volume  in  the  three-dimensional  state  space. 

In  general  in  this  thesis  we  will  assume  that  a  continuous  state  space  is  some 
bounded  subset  of  9?n,  for  appropriate  dimension  n.  Such  a  space  corresponds 
naturally  to  a  system  with  several  translational  degrees  of  freeA  m,  but  no  rotational 
degrees  of  freedom.  However,  natural  generalizations  to  n-dimensional  manifolds 
exist.  See,  among  others,  [Loz81],  [ScShllj,  and  [Can89]. 

Discrete  Space 

An  example  of  a  discrete  state  space  is  given  by  the  stable  orientations  of  a 
polyhedral  part  resting  on  a  horizontal  table  under  the  influence  of  gravity.  Figure 


62 


CHAPTER  2.  THESIS  OVERVIEW  AND  TECHNICAL  TOOLS 


Figure  2.2:  A  two-dimensional  peg-in-hole  problem.  Also  shown  are  three  states  that 
might  be  used  in  a  discrete  approximation  to  the  continuous  problem. 


2.1  depicts  the  planar  case.  The  figure  shows  three  stable  orientations  of  a  planar 
part  resting  on  a  horizontal  table.  By  tilting  the  table  for  a  short  amount  of  time  the 
part  can  be  made  to  roll  between  different  such  configurations.  While  the  analysis  of 
the  forces  required  to  move  the  part  may  require  consideration  of  a  continuous  space, 
once  this  analysis  has  been  performed,  it  is  sufficient  to  consider  the  resulting  discrete 
space  in  planning  operations  to  orient  the  part  stably.  This  example  is  taken  from 
[EM]. 

Discrete  representations  also  arise  as  approximations  to  continuous  spaces.  For 
instance,  one  might  place  a  fine  tiling  over  a  continuous  state  space,  then  regard  each 
of  the  tiles  as  a  state  in  a  discrete  state  space.  Finally,  sometimes  tasks  formulated 
in  continuous  spaces  may  be  transformed  naturally  into  a  discrete  representation. 
For  instance,  consider  the  planar  task  of  inserting  a  two-dimensional  peg  into  a  two- 
dimensional  hole  (see  figure  2.2).  Assume  that  the  peg  cam  only  translate,  but  not 
rotate.  If  the  peg  has  made  contact  with  the  horizontal  edges  near  the  hole,  then 
the  problem  can  be  represented  as  a  three-state  system.  One  state  corresponds  to 
contact  with  the  edge  to  the  left  of  the  hole,  another  state  corresponds  to  contact 
with  the  edge  to  the  right  of  the  hole,  and  the  third  state  corresponds  to  entry  into 
the  hole.  While  this  representation  discards  some  information,  such  as  the  distance 
of  the  peg  from  the  hole,  it  still  retains  the  basic  geometrical  relationships  required 
to  attain  the  hole. 

The  discrete  spaces  treated  in  this  thesis  are  assumed  to  be  finite.  Thus  a  discrete 
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state  space  is  simply  a  finite  set  S  =  {so, si,S2> •  •  • , $„},  for  some  n.  Most  of  the 
development  of  the  theory  of  probabilistic  strategies  will  be  done  on  finite  discrete 
spaces  (see  chapter  3).  This  is  primarily  a  device  for  simplifying  the  presentation.  The 
results  carry  over  with  appropriate  modifications  to  continuous  spaces.  The  extension 
to  continuous  spaces  is  handled  in  chapter  4. 

2.2.2  Actions 

Actions  are  transfoimations  on  the  state  space.  There  are  three  broad  classes 
of  actions:  deterministic ,  non- deterministic,  and  probabilistic.  In  some  sense,  the 
category  of  non-deterministic  actions  includes  deterministic  and  probabilistic  actions 
as  special  cases.  Another  special  case  is  given  by  non-deterministic  actions  whose 
underlying  non-determinism  is  constrained.  These  actions  fall  under  the  category  of 
partial  adversaries ,  which  we  discuss  below  as  well. 

In  terms  of  information  content,  the  ordering  of  action  categories  by  decreasing 
certainty  is:  DETERMINISTIC,  PROBABILISTIC,  PARTIALLY  ADVERSARIAL,  NON- 

Deterministic. 

Deterministic  Actions 

A  deterministic  action  maps  each  state  of  the  state  space  to  some  other  state.  This  is 
most  easily  represented  in  the  discrete  case.  If  s  6  S  is  a  state,  and  A  is  an  action,  then 
A(s)  is  some  other  state  in  S.  For  instance,  in  the  three-state  peg-in-hole  example  of 
figure  2.2,  an  action  might  correspond  to  the  operation  MoVE-RlGHT.  Denote  the 
three  states  by  snght,  5 left i  Shoie,  corresponding  to  contact  with  the  edge  to  the  right  of 
the  hole,  contact  with  the  edge  to  left  of  the  hole,  and  entry  into  the  hole,  respectively. 
Then  one  might  have  that  MoVE-RlGHT(sright)  =  Sright,  MoVE-RlGHT(sieft)  =  Shoiei 
and  MoVE-RlGHT( Stole)  =  shole. 

In  the  continuous  case,  executing  an  action  generally  entails  performing  some 
operation  over  some  duration  of  time.  For  instance,  for  a  simple  first-order  linear 
system,  an  action  may  correspond  to  executing  a  velocity  over  some  time  interval.  In 
that  case,  if  x  £  is  a  state  of  the  system,  then  an  action  is  of  the  form  (v,  At), 
and  the  effect  of  an  action  is  to  move  x  to  the  state  x  +  A  tv. 

Non-Deterministic  Actions 

A  non-deterministic  action  is  a  relation  on  the  state  space  rather  than  a  function. 
It  transforms  each  state  to  a  set  of  states.  The  purpose  of  a  non-deterministic 
action  is  to  model  uncertainty.  This  may  correspond  either  to  non-determinism  in 
the  transitions  specified  by  the  action,  or  it  may  simply  correspond  to  a  paucity  of 
knowledge  in  modelling  these  transitions.  In  the  discrete  case  we  will  write  the  effect 
of  a  non-deterministic  action  as  F*(s).  This  is  called  the  forward  projection  of  the 
state  s  under  the  action  A.  The  forward  projection  is  a  subset  of  the  state  space.  A 
similar  representation  exists  for  the  continuous  case,  although  now  the  action  must 
also  include  a  time  parameter. 
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Figure  2.3:  Graphical  representation  of  a  non-deterministic  action  A\. 


Figure  2.3  depicts  a  four-sta„e  system,  in  which  action  A\  non-deterministically 
maps  state  s0  to  the  three  other  states.  In  other  words,  Fax(sq)  —  {sj,  52,53}. 

A  non-deterministic  action  measures  the  worst-case  behavior  of  the  system. 
Nothing  is  said  about  the  actual  likelihood  that  a  particular  transition  wdl  be  taken. 
In  other  words,  if  a  state  sj  appears  in  the  set  FA(s),  then  one  must  assume  that 
action  A  might  cause  state  s  to  move  to  state  Sj.  However,  one  cannot  be  sure  that 
this  will  ever  occur. 

One  view  is  to  imagine  an  adversary,  who  can  force  state  s  to  move  to  state  Sf 
whenever  this  would  be  to  one's  disadvantage,  but  who  also  can  move  s  to  some  other 
state  in  Fa(s)  whenever  one  would  actually  like  to  attain  s}.  This  is  what  is  meant 
by  a  worst-case  modelling  of  an  action. 

Partial  Adversaries 

As  we  have  indicated,  the  non-deterministic  representation  of  actions  provides  a 
worst-case  view  which  may  considerably  overestimate  the  uncertainty  in  the  actions. 
For  instance  consider  a  first-order  linear  system  in  3?2,  governed  locally  by  the 
equation  x(t)  =  Xo  + 1  v,  where  Xo  is  the  starting  state,  v  is  the  actual  velocity  of  the 
system,  and  t  is  the  elapsed  time.  Suppose  in  fact  that  the  starting  state  is  the  origin, 
and  that  the  action  consists  of  commanding  the  nominal  velocity  (1,0)  for  some  time 
interval  At.  Suppose  that  the  effect  of  this  action  is  modelled  non-deterministically. 
In  particular,  any  velocity  of  the  form  (l,e)  can  result,  where  e  €  [—0.25,0.25].  Now 
imagine  that  one  repeatedly  commands  this  action,  say  1000  times,  each  time  for 
duration  At  =  1.  The  non-deterministic  representation  says  little  about  the  actual 
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Figure  2.4:  This  figure  shows  the  possible  locations  of  the  system  after  executing 
a  commanded  velocity  subject  to  uncertainty  for  6  time  units.  The  commanded 
velocity  is  (1,0).  The  effective  velocity  is  given  non-deterministically  by  (l,e),  with 
c  G  [—0.25,0.25].  The  figure  also  shows  the  final  location  if  the  error  e  is  fixed.  In 
this  case  the  resulting  motion  is  repeatable. 
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location  of  the  system  after  these  1000  actions.  All  one  can  say  for  sure  is  that  the 
x-position  will  be  1000,  while  the  p-position  will  lie  in  the  range  [—250,250].  See 
figure  2.4  for  the  state  of  the  system  at  t  =  6. 

Indeed,  if  an  ad'-ersary  could  at  each  instant  in  time  choose  e  arbitrarily  within  the 
range  r  -0.25,0.25],  then  this  is  the  best  possible  prediction  of  the  future  state  of  the 
system.  Yet,  it  may  turn  out  that  the  system  cannot  actually  behave  in  this  worst-case 
manner.  In  particular,  the  non-deterministic  representation  of  the  velocity  as  (l,e) 
may  be  due  to  a  fixed  but  unknown  bias  in  the  control  system.  Thus,  after  executing 
the  velocity  for  time  t  =  1000,  the  system  is  actually  at  the  location  (1000,  lOOOe), 
with  fixed  e  6  [—0.25,0.25],  Offhand,  this  case  may  not  seem  any  better  than  before; 
the  prediction  of  the  final  state  of  the  system  again  places  the  p-coordinate  somewhere 
into  the  range  [-250.250],  However,  if  one  could  make  observations  of  the  system’s 
position  at  some  time  after  initiating  the  motion,  then  one  could  accurately  predict 
the  final  location  of  the  system.  More  importantly,  the  action  is  repeatable.  In  other 
words,  whenever  the  system  starts  at  the  origin,  subject  to  the  commanded  velocity 
(1.0)  for  time  t  =  1000,  the  system  will  wind  up  at  the  location  (1000. 1000  e),  where 
c  is  some  fixed  number  in  the  range  [—0.25,0.25]. 

One  way  to  view  the  previous  example  is  to  realize  that  the  non-deterministic 
choices  possible  at  any  instant  in  time  are  coupled.  In  this  example,  nature  cannot 
choose  the  velocities  arbitrarily  at  every  instant  in  time.  Instead,  the  fixed  bias 
constrains  these  choices  over  time.  Only  the  bias  itself  is  arbitrary  and  unknown.  Said 
differently,  the  underlying  uncertainty  does  not  behave  like  a  worst-case  adversary, 
but  merely  like  a  partial  adversary.  Choices  made  by  the  adversary  constrain  further 
choices.  From  a  predictive  point  of  view  one  may  still  wish  to  model  the  system  in  a 
worst-case  manner.  However,  one  can  often  take  advantage  of  the  coupling  between 
the  unknown  parameters  of  the  system,  without  initially  knowing  the  instantiation 
of  these  parameters.  In  the  previous  example  this  advantage  takes  the  form  of  being 
able  to  execute  an  action  repeatably.  We  will  demonstrate  another  example  involving 
sensing  biases  in  section  2.4. 

One  should  realize  that  this  is  a  particularly  simple  example.  In  general  there 
may  be  several  components  to  an  error.  Some  of  these  may  behave  adversarially, 
some  may  behave  like  partial  adversaries,  and  some  may  behave  probabilistically  (see 
the  next  paragraph).  For  instance,  it  is  quite  common  to  have  an  error  that  consists 
of  biased  noise.  In  this  case  the  bias  is  like  a  partial  adversary,  while  the  noise  is 
probabilistic. 

Probabilistic  Actions 

Probabilistic  actions  are  a  special  case  of  non-deterministic  actions,  in  which  it  is 
possible  to  assign  a  probability  density  function  to  the  forward  projection.  Consider 
in  the  discrete  case  the  forward  projection  Fa(s)  of  some  state.  This  set  is  of  the 
form  Fa(s)  =  { .s , ,  •  •  ■  ,sq},  for  some  set  of  states  s,, . . .  ,sq.  For  a  probabilistic  action 
A.  one  can  assign  to  each  state  s,  a  probability  p,.  This  means  that  if  the  system 
is  initially  in  state  ,s,  and  one  executes  action  .4.  then  state  s,  will  be  attained  with 
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probability  px. 

A  probabilistic  representation  of  an  action  carries  with  it  considerably  more 
information  than  does  a  non-deterministic  model.  Clearly  not  all  actions  may  thus  be 
modelled.  For  instance,  in  the  example  of  figure  2.4,  if  the  error  in  the  commanded 
velocity  is  indeed  a  fixed  but  unknown  bias,  then  one  cannot  model  it  as  a  probabilistic 
action.  However,  if  the  error  is  due  to  noise,  with  a  known  bias,  then  it  makes  sense 
to  think  of  the  error  e  as  a  random  variable  in  the  range  [—0.25, 0.25).  In  that  case, 
the  extra  information  provided  by  the  probabilistic  representation  manifest  itself  via 
the  central  limit  theorem.  In  particular,  suppose  that  the  basic  action  consists  of 
commanding  the  velocity  (1,0)  for  time  At  =  1.  Now  imagine  applying  this  action 
1000  times  consecutively.  Then  the  central  limit  theorem  tells  us  that  the  y-coordinate 
of  the  final  position  of  the  system  will  be  normally  distributed  about  1000p<.  Here 
f. is  the  expected  value  of  e,  that  is,  the  bias  in  the  noise. 

2. 2.3  Sensing 

Sensing  aids  in  reducing  uncertainty.  A  system  that  observes  its  behavior  can 
sometimes  compensate  for  errors  in  control.  However,  uncertainty  enters  into  sensing 
as  well.  We  will  consider  a  spectrum  of  sensing  uncertainty,  analogous  to  the  various 
forms  of  action  uncertainty.  Specifically,  of  interest  are  perfect  sensing,  sensing  with 
probabilistic  errors,  sensing  with  non-deterministic  errors,  and  sensorless  systems, 
that  is  systems  with  infinite  sensing  uncertainty.  Closely  related  to  the  sensorless 
systems  are  near-sensorless  systems,  in  which  there  is  just  enough  sensing  to  detect 
task  completion. 

In  terms  of  information  content,  the  ordering  of  sensing  categories  by  decreasing 
certainty  is:  Perfect.  Probabilistic,  Non-Deterministic,  Nearly- 
SENSORLESS.  Sensorless.  As  with  control  uncertainty,  there  are  also  PARTIALLY 
ADVERSARIAL  versions  of  non-deterministic  sensing. 

Perfect  Sensing 

A  perfect  sensor  is  one  that  reports  the  system’s  state  with  complete  accuracy.  It  is 
fairly  easy  to  plan  strategies  for  such  systems,  even  if  control  is  uncertain.  We  shall 
discuss  this  issue  further  below. 


Imperfect  Sensing:  Basic  Terms 

An  imperfect  sensor  is  a  sensor  that  returns  a  sensed  value  that  need  not  be  the  actual 
state  ot  the  system.  Generally,  given  a  state  x  of  the  system,  there  is  a  collection  of 
sensor  values  {x*}  that  might  be  observed.  For  each  sensed  value  xm,  the  system  can 
infer  that  the  actual  state  of  the  system  must  lie  in  some  set  of  interpretations  I(xm). 
The  exact  nature  of  the  interpretation  set  depends  on  the  type  of  sensor. 

The  next  few  paragraphs  discuss  imperfect  sensors  in  more  detail,  as  well  as 
provide  examples  of  such  sensors. 
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Figure  2.5:  This  figure  shows  the  actual  location  x  of  a  system,  along  with  an  observed 
sensor  value  xm .  The  disk  bounded  by  the  solid  circle  depicts  the  range  of  possible 
sensor  values  assuming  a  bounded  but  unknown  sensing  error.  The  disk  bounded  by 
the  dashed  circle  depicts  the  possible  interpretations  of  the  observed  sensor  value. 
Notice  that  the  actual  state  of  the  system  is  indeed  a  possible  interpretation  of  the 
observed  sensor  value. 


Imperfect  Sensing:  Non-Deterministic  Sensing 

In  the  non-deterministic  case,  for  each  actual  state  x  of  the  system,  there  is  a  collection 
E(x)  =  {/(x*)}  of  possible  interpretation  sets  that  might  result  upon  sensing.  There 
is  one  interpretation  set  /(x*)  for  each  possible  sensor  value  x*.  No  further  assumption 
is  made  about  the  actual  likelihood  of  observing  a  particular  sensor  value  x*.  This  is 
analogous  to  the  worst-case  representation  of  uncertainty  in  actions.  Similarly,  each 
interpretation  set  7(x*)  is  a  set  of  possible  states  of  the  system.  Again  no  assumption 
is  made  about  the  actual  likelihood  that  the  system  is  in  a  particular  state  in  the  set 
7(x*),  given  that  x*  has  just  been  observed. 

As  an  example,  imagine  that  the  state  space  is  a  subset  of  the  real  line.  Suppose 
that  whenever  the  actual  state  of  the  system  is  at  the  point  x,  then  the  range  of  sensor 
values  that  the  system  might  observe  is  given  by  the  interval  (x  —  e,  x  -f  e)  for  some 
t  >  0.  This  is  sometimes  referred  to  as  an  unknown  but  bounded  model  of  uncertainty. 
Clearly,  if  the  system  observes  a  sensor  value  x*,  then  the  set  of  interpretations  of 
that  sensor  value  is  given  by  the  interval  /(x*)  =  (x*  —  e,  x*  +  e).  Figure  2.5  depicts 
a  two-dimensional  example. 


Imperfect  Sensing:  Probabilistic  Sensing 

A  probabilistic  sensor  is  an  imperfect  sensor  for  which  there  exists  a  probability 
density  function  over  the  range  of  possible  sensor  values.  For  instance,  given  a 
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position  x  6  3fn,  the  range  of  sensor  values  x"  might  be  described  by  a  normal 
distribution  centered  at  x.  Inverting  this  collection  of  distributions  using  Bayes’  rule 
allows  one  to  construct  for  each  sensor  value  x*  a  set  of  interpretations  f(x’).  This 
set  of  interpretations  is  itself  a  probability  density  function  describing  the  likelihood 
that  the  system  is  in  state  x  given  that  one  has  observed  sensor  value  x*. 

Imperfect  Sensing:  Sensorless  and  Near-Sensorless  Tasks 

In  sensorless  tasks  there  is  no  sensing,  whereas  in  near-sensorless  tasks  there  is  no 
sensing  except  to  signal  goal  attainment.  Without  sensing  a  system  must  rely  entirely 
on  its  actions  and  predictive  ability  to  attain  the  goal.  In  the  near- sensor  less  case 
this  is  essentially  true  as  well,  except  that  there  is  an  additional  bit  of  information 
which  signals  success  should  the  goal  ever  be  attained.  This  is  useful  for  systems  that 
repeatedly  execute  a  loop  that  has  some  chance  of  attaining  the  goal  but  that  is  not 
guaranteed  to  attain  the  goal.  See  below.  We  prove  later  (see  section  3.13.2)  that  the 
class  of  tasks  solvable  using  a  sensorless  system  is  very  much  like  the  class  of  tasks 
solvable  using  a  near-sensorless  system.  Of  course,  for  any  particular  task,  adding  a 
goal  recognizer  can  change  the  task  from  being  unsolvable  to  being  solvable. 

Any  open-loop  task  is  by  definition  a  sensorless  task.  For  instance,  the  gross 
motions  used  to  manipulate  objects  in  uncluttered  environments  are  examples  of 
sensorless  tasks.  Within  the  fine-motion  phase  of  assembly  an  example  of  a  sensorless 
task  is  the  process  of  orienting  parts  by  pushing  one  part  against  another.  This  is 
similar  to  the  palletizing  that  occurs  when  for  instance  luggage  containers  are  loaded 
onto  airplanes.  The  containers  are  rolled  onto  large  loading  lifts  that  lift  the  containers 
from  ground  level  up  to  the  cargo  door  of  a  plane.  The  containers  are  generally  not 
yet  oriented  properly  after  having  been  rolled  onto  the  loading  lifts.  However,  the 
platform  of  the  loading  lift  consists  of  motorized  wheels  that  push  the  container  into 
a  corner  of  the  lift  assembly.  The  result  is  that  the  the  container  is  oriented  properly 
in  the  absence  of  any  sensing.  Many  feeder  mechanisms  operate  on  this  principle. 
[Mas85]  refers  to  such  operations  as  funnels.  Indeed,  a  funnel  for  filling  a  jar  with 
water  or  flour  is  a  classic  example  of  a  strategy  that  uses  task  mechanics  rather  than 
sensing  to  constrain  the  behavior  of  a  system. 

More  generally,  many  operations  involve  aspects  of  sensorless  strategies.  This  is 
because  often  some  mechanical  interaction  between  parts  occurs  below  the  resolution 
of  available  sensors.  The  motion  of  an  object  due  to  impact  during  a  gasping  operation 
is  one  example. 

Examples  of  near-sensorless  system  can  easily  be  constructed  from  examples  of 
sensorless  systems.  Essentially  the  goal  recognizer  acts  as  a  verification  mechanism 
that  ensures  that  the  task  really  has  been  accomplished.  This  is  useful  particularly 
when  one’s  assumptions  about  the  task  mechanics  are  subject  to  uncertainty. 

In  the  context  of  this  thesis,  an  example  of  a  near- sensorless  system  is  given  by  the 
behavior  of  a  randomized  strategy  such  as  the  peg-in- hole  strategy  of  chapter  1,  once 
the  sensors  no  longer  provide  useful  information  to  guide  the  assembly.  Essentially 
the  strategy  is  operating  without  any  relevant  sensing.  However,  the  goal  recognizer  is 
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used  to  terminate  the  strategy.  In  the  peg-in-hole  case,  goal  recognition  was  achieved 
by  noting  that  the  camera  image  indicated  that  the  peg  had  entered  the  hole. 


2.3  Strategies 

Of  great  importance  is  the  process  by  which  one  synthesizes  strategies  to  the  various 
types  of  tasks  discussed  above.  Part  of  the  question  is  the  definition  of  a  strategy. 

2.3.1  Guaranteed  Strategies 

Traditionally,  guaranteed  strategies  and  optimal  strategies  have  been  the  focus  of 
attention.  These  in  turn  may  be  subdivided  by  the  manner  in  which  they  treat 
sensory  and  predictive  information.  At  one  extreme  is  a  strategy  that  makes  full  use 
of  sensing  history  and  forward  projections  of  the  current  state.  At  the  other  extreme 
is  a  simple  feedback  loop ,  which  is  a  strategy  that  only  considers  current  sensory 
information  in  making  decisions. 

Recall  that  by  a  guaranteed  strategy  we  mean  a  set  of  possibly  conditional  actions 
that  are  certain  to  accomplish  a  task  in  a  bounded  predetermined  amount  of  time. 

2.3.2  Randomized  Strategies 

This  thesis  introduces  a  class  of  strategies  complementary  to  guaranteed  strategies, 
known  as  randomized  strategies.  One  of  the  characteristics  of  a  guaranteed  strategy 
is  that  it  attains  its  goal  in  a  bounded  predetermined  number  of  steps.  In  contrast,  a 
randomized  strategy  consists  of  a  sequence  of  operations  that  only  has  some  non-zero 
probability  of  attaining  its  goal.  The  key  to  success  with  a  randomized  strategy  is 
to  place  a  loop  around  this  sequence  of  operations.  This  means  that  one  repeatedly 
executes  the  sequence  of  operations  inside  the  loop  until  the  sequence  eventually 
succeeds. 

A  key  ingredient  to  randomized  strategies  is  active  guessing  or  randomization. 
This  takes  the  form  of  either  guessing  the  location  of  the  system  or  of  executing 
an  action  that  has  been  randomly  selected  from  some  applicable  set  of  actions. 
Guessing  the  location  of  the  system  is  a  means  of  compensating  for  uncertain  sensing 
information.  Executing  a  random  action  is  a  means  of  avoiding  getting  stuck  in  some 
location  from  which  there  is  no  guaranteed  strategy  of  escape.  Clearly  one  may  draw 
connections  between  these  two  forms  of  randomization. 

The  motivation  for  considering  randomized  strategies  is  to  increase  the  class 
of  solvable  tasks,  to  reduce  knowledge  requirements,  and  to  simplify  the  planning 
process.  This  is  facilitated  in  two  ways.  First,  by  not  insisting  on  guaranteed  plans, 
one  automatically  broadens  the  class  of  tasks  for  which  one  can  provide  solutions, 
although  the  solutions  are  now  solutions  in  a  probabilistic  sense.  Second,  by  actively 
randomizing  at  both  the  sensing  and  action  levels,  one  can  reduce  the  knowledge 
details  needed  to  solve  a  task.  This  makes  it  easier  to  plan  solutions  to  tasks  for 
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Figure  2.6:  This  figure  depicts  schematically  how  a  system  might  update  its  run-time 
knowledge  state  using  both  prediction  and  sensing.  First,  the  system  forward  projects 
the  previous  knowledge  state  Kx  using  the  current  action  A.  Second,  the  system 
intersects  the  resulting  set  FA (K\)  with  the  interpretations  of  the  current  sensed 
value  x’.  Ki  is  the  updated  knowledge  state. 


which  there  exist  guaranteed  solutions.  In  addition,  it  permits  some  tasks,  for  which 
there  are  no  guaranteed  solutions,  to  be  solved  in  an  expected  sense.  In  effect, 
randomization  blurs  the  details  of  the  environment.  For  instance,  in  the  peg-in¬ 
hole  problem  of  figure  2.2,  if  the  horizontal  edges  contain  slight  nicks,  then  the  peg 
could  become  stuck  while  sliding.  Rather  than  plan  for  every  possible  nick  explicitly, 
it  makes  sense  to  invoke  some  type  of  randomizing  action  that  is  likely  to  start  the 
peg  slid-'ng  again. 


2.3.3  History  and  Knowledge  States 

We  mentioned  above  that  strategies  may  be  classified  by  their  use  of  history.  This 
applies  both  to  guaranteed  strategies  and  to  randomized  strategies.  Another  way 
to  phrase  this  is  to  characterize  the  knowledge  state  of  the  system  at  run-time.  A 
knowledge  state  is  always  some  subset  of  the  state  space.  It  reflects  th^  certainty  with 
which  the  system  knows  its  actual  state.  In  the  case  of  perfect  sensing,  the  knowledge 
state  is  a  singleton  set  containing  the  actual  state  of  the  system.  More  generally,  a 
knowledge  state  can  be  an  arbitrary  subset  of  the  state  space. 

Systems  differ  in  the  manner  by  which  they  update  their  knowledge  states.  A 
simple  feedback  loop  only  considers  current  sensed  values.  Thus  the  knowledge  state 
of  a  simple  feedback  loop  is  always  the  most  recent  sensory  interpretation  set  /( x*), 
where  x*  is  the  most  recently  observed  sensor  value. 

A  system  that  makes  full  use  of  sensing  history  updates  its  knowledge  state  by 
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forward  projecting  the  previous  knowledge  state  and  intersecting  it  with  the  current 
sensory  interpretation  set.  We  will  state  this  semi-formally  for  the  discrete  case,  in 
the  next  paragraph.  A  similar  description  exists  for  the  continuous  case;  it  is  depicted 
pictorially  in  figure  2.6.  Both  these  descriptions  apply  to  non-deterministic  actions 
and  non-deterministic  sensing.  In  the  probabilistic  setting,  the  analogous  operation 
is  given  by  the  Kalman  filter  (see  [Brown],  for  instance). 

Turning  now  to  the  discrete  case,  suppose  that  the  most  recent  knowledge  state 
is  K i,  that  the  action  just  executed  is  A ,  and  that  the  sensory  interpretation  set  is  I. 
The  new  knowledge  state  derived  from  this  information  is  given  by  K2  =  FA(K1)f)  I. 
In  other  words,  the  previous  knowledge  state  is  first  forward  projected  to  account  for 
any  changes  due  to  the  executed  action.  The  resulting  set  is  then  intersected  with  the 
sensory  information.  Updating  the  knowledge  state  in  this  manner  on  each  time  step 
ensures  that  full  use  is  made  of  sensing  history  and  of  predictive  ability,  within  the 
bounds  given  by  the  non-deterministic  description  of  sensing  and  action  uncertainty. 


2.3.4  Planning 

Planning  Guaranteed  Strategies 

Once  one  has  the  notion  of  a  knowledge  state,  planning  guaranteed  strategies  is 
conceptually  simple.  Specifically,  one  backchains  in  the  space  of  knowledge  states, 
starting  from  the  goal.  This  process  is  sometimes  referred  to  as  dynamic  programming. 
It  is  discussed  in  further  detail  for  the  discrete  context  in  section  3.2.4.  Chapter 
4  discusses  the  [LMT]  preimage  framework,  which  is  a  backchaining  approach  for 
computing  guaranteed  strategies. 

Briefly,  backchaining  proceeds  as  follows.  Given  a  collection  of  goal  states  {Ga}, 
the  planner  determines  all  pairs  of  knowledge  states  and  actions  (K,  A),  for  which 
attainment  of  one  of  the  goals  Ga  is  guaranteed.  This  means  that  for  each  sensory 
interpretation  set  /(i*)  that  the  run-time  system  might  observe  upon  execution  of 
action  A,  the  updated  knowledge  state  lies  inside  a  goal.  Formally  one  must  have 
that  Fa(K)C\  /(x*)  C  Ga  for  some  a.  The  collection  of  all  knowledge  states  K  that 
satisfy  this  condition  comprises  a  new  collection  of  goal  states  for  the  next  level  of 
backchaining.  This  process  is  repeated  until  a  knowledge  state  is  constructed  that 
includes  the  initial  state  of  the  system,  or  until  there  are  no  further  knowledge  states 
to  be  constructed. 

Planning  Randomized  Strategies 

The  aim  of  this  thesis  is  to  analyze  randomized  strategies  and  explore  methods  for 
synthesizing  these  strategies.  In  the  context  of  this  thesis  randomization  takes  the 
form  of  either  guessing  the  current  state  of  the  system  or  of  executing  a  randomizing 
motion.  These  two  approaches  are  very  similar,  as  is  made  clear  by  considering 
knowledge  states.  As  an  example,  consider  again  the  discrete  representation  for 
the  peg-in-hole  task  of  figure  2.2.  Suppose  that  the  initial  knowledge  state  is 
K  =  [siefti  •Snght}-  This  means  that  the  system  knows  that  it  is  on  a  horizontal 
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edge  near  the  hole,  but  is  unsure  of  which  one.  The  state-guessing  approach 
consists  of  randomly  guessing  that  the  actual  state  is  either  state  or  state 
bright)  then  executing  a  motion  designed  to  attain  the  goal  from  that  state.  The 
randomizing-action  approach  consists  of  randomly  moving  either  left  or  right,  in  the 
hope  of  attaining  the  goal.  For  this  simple  example  the  two  approaches  are  trivially 
equivalent. 

More  generally,  this  example  suggests  that  both  state-guessing  and  action- 
randomization  may  be  viewed  as  the  random  selection  of  a  knowledge  state  that  is  a 
subset  of  the  system’s  actual  knowledge  state  at  run-time.  In  other  words,  suppose 
that  the  system  knows  that  it  is  located  somewhere  in  the  set  K,  and  suppose  further 
that  this  is  not  enough  information  to  accomplish  a  task  successfully.  Then  it  makes 
sense  to  guess  between  some  collection  of  smaller  knowledge  states  K\, . . . ,  A,  that 
cover  A',  assuming  that  for  each  of  the  knowledge  states  A',  there  is  a  strategy  for 
attaining  the  goal.  Selecting  one  of  the  states  K{  may  be  viewed  either  as  guessing 
an  artificial  sensory  interpretation  set  or  as  selecting  a  random  sequence  of  actions. 
The  sensory  interpretation  set  is  just  the  set  A';,  while  the  sequence  of  actions  is 
the  plan  associated  with  K,  for  attaining  the  goal.  This  suggests  that  the  synthesis 
of  randomized  strategies  may  be  built  on  top  of  the  backchaining  approach  used  to 
synthesize  guaranteed  strategies.  The  guaranteed  approach  is  simply  augmented  with 
an  additional  operator,  SELECT,  that  permits  the  system  to  make  random  choices. 
Additionally,  one  must  worry  about  whether  it  is  possible  to  repeat  this  guessing 
operation  should  the  first  guess  fail  to  attain  the  goal.  Chapter  3  examines  these 
issues  in  greater  detail,  while  section  2.6  later  in  this  chapter  provides  a  further 
outline. 


2.4  A  Randomizing  Example 

Let  us  continue  with  an  example.  The  purpose  of  this  example  is  to  demonstrate  the 
relationship  between  guaranteed  strategies,  local  progress,  and  randomization  in  a 
continuous  space.  The  scene  is  the  two  dimensional  plane.  The  state  of  the  system  is 
a  point  on  this  plane.  The  goal  is  a  circle  of  radius  r  centered  at  the  origin.  The  task 
consists  of  moving  the  system  into  the  goal.  This  representation  might,  for  instance, 
be  the  appropriate  formulation  of  the  problem  of  sliding  a  peg  towards  a  hole  on 
a  level  surface  surrounding  the  hole.  The  point  in  this  case  corresponds  to  some 
reference  point  on  the  peg,  while  the  plane  corresponds  to  the  two  degrees  of  sliding 
freedom  available  to  the  peg. 

If  sensing  and  control  are  perfect,  then  the  task  is  accomplished  by  sensing  the 
start  position,  then  moving  in  a  straight  line  towards  the  origin,  stopping  once  the 
circle  is  entered.  Suppose  however  that  sensing  is  imperfect.  Then  it  may  not  always 
be  clear  in  which  direction  to  move.  Let  us  look  at  a  special  case  involving  imperfect 
sensing,  while  retaining  the  assumption  of  perfect  velocity  control.  In  addition,  we 
will  assume  that  the  goal  is  independently  recognizable,  that  is,  if  ever  the  state  of 
the  system  enters  the  goal,  then  some  sensor  will  signal  goal  attainment.  In  the  peg- 
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Figure  2.7:  If  there  is  a  constant  sensing  bias  and  the  system  interprets  the  sensor  as 
correct,  then  the  system  may  converge  to  a  point  ot  n_;r  than  the  goal. 


in-hole  example,  this  might  be  achieved  by  noting  that  the  peg  is  falling  into  the  hole, 
that  is.  by  using  force  sensors  to  detect  that  contact  with  the  surrounding  surface  has 
been  broken.  Another  possibility  is  to  sense  the  peg’s  height  in  the  ^-direction. 

In  general  we  will  model  sensing  errors  as  error  balls.  Specifically,  we  will  assume 
that  if  the  actual  location  of  the  system  is  given  by  the  point  x,  then  the  sensor  will 
return  a  sensed  value  x*  €  £fj(x),  where  £t,(x)  is  the  ball  of  radius  e,  centered  at  x. 
As  we  have  mentioned  before,  £f,(x)  represents  the  non-determinism  in  the  system's 
knowledge  of  the  sensor.  It  may  be  the  case  that  all  possible  positions  in  £e,(x)  could 
be  returned  by  the  sensor,  or  simply  that  some  subset  could  be  returned.  Further,  the 
sensor  may  return  values  probabilistically  distributed  over  £c,(x),  or  it  may  return 
values  in  an  adversarial  manner.  Without  further  information,  the  system  must  plan 
as  if  the  sensor  is  actually  acting  as  an  adversary. 

Suppose,  however,  for  the  sake  of  this  example,  that  the  sensor  always  returns  the 
actual  location  of  the  system  offset  by  a  fixed  bias  b.  The  actual  bias  is  unknown 
to  the  system,  merely  its  maximum  magnitude  6m„  is  known.  So.  one  may  take 
t,  =  ft™.,  =  maxb  |b|.  In  what  follows  we  will  draw  all  figures  as  if  b  =  (6,0),  with 
0  <  6  <  brn^.  However,  this  is  just  for  convenience  of  exposition;  the  bias  may  lie 
anywhere  inside  the  disk  of  radius  6m,r. 

Now  consider  what  happens  if  the  system  continues  to  interpret  the  sensor  as 
correct.  See  figure  2.7.  If  the  system  is  at  location  x,  then  the  sensor  will  report  that 
the  system  is  at  x  -1-  b.  Aiming  for  the  origin,  the  system  thus  will  move  in  a  straight 
line  parallel  to  the  vector  — (x  +  b).  This  line  points  directly  from  the  actual  location 
x  to  the  point  — b.  If  6,^  is  less  than  the  radius  of  the  goal,  then  the  system  will 
still  successfully  attain  the  goal.  So  suppose  that  the  point  —  b  lies  outside  of  the 
goal.  It  is  still  possible  for  the  system  to  wind  up  in  the  goal,  namely  if  and  only  if 
the  line  connecting  the  two  points  x  and  -b  passes  through  the  goal  circle  of  radius 
r  (recall  that  there  is  no  control  error).  See  figure  2.8.  Thus  there  is  one  region  from 
which  this  strategy  is  guaranteed  to  attain  the  goal,  and  another  from  which  this 
strategy  causes  the  system  to  converge  to  the  point  — b  (recall  that  the  sensing  error 
is  a  pure  bias,  without  any  superimposed  noise).  Of  course,  if  b  were  known,  then 
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Figure  2.8:  If  the  line  from  the  system's  starting  configuration  to  the  negative  bias 
passes  through  the  goal,  then  the  system  will  converge  to  the  goal.  Otherwise,  it  will 
converge  to  the  negative  bias.  This  example  assumes  perfect  velocity  control. 


the  strategy  could  be  modified  to  always  achieve  the  goal,  but  b  is  unknown.  Merely 
6m»x  known  to  the  system.  Let  us  denote  the  region  from  which  the  strategy  is 
guaranteed  to  attain  the  goal  by  P. 

Suppose  that  we  are  interested  in  a  simple  feedback  strategy  designed  to  attain 
the  goal,  by  making  judicious  use  of  sensors  and  randomizing  when  necessary.  In 
particular,  the  strategy  may  not  retain  any  past  sensing  information,  but  must  base 
all  its  decisions  on  current  sensed  values.  We  will  consider  such  a  situation  for  the 
discrete  case  in  section  3.12.3.  In  particular,  we  want  a  strategy  that  will  make 
progress  towards  the  goal  when  possible  and  otherwise  will  randomize  its  position. 
Consider  then  a  circle  of  radius  d,  centered  at  the  origin.  The  radius  d  is  to  be 
chosen  in  such  a  way  that  progress  is  possible  towards  the  goal  whenever  a  sensed 
value  lies  outside  of  the  circle,  while  progress  is  not  guaranteed  whenever  a  sensed 
value  lies  inside  the  circle.  We  will  discuss  choosing  d  as  a  function  of  control  and 
sensing  uncertainty  in  greater  detail  in  chapter  5.  For  the  current  example  it  makes 
sense  to  take  d  =  es  =  This  is  because  whenever  a  sensed  value  appears  within 
t,  of  the  origin,  the  system  cannot  be  sure  on  which  side  of  the  origin  the  actual 
position  is  located,  and  thus  cannot  decrease  the  distance  to  the  origin.  It  is  true 
that  the  system  can  in  general  rule  out  locations  that  lie  within  the  goal,  and  thus 
using  d  =  e,  is  overly  conservative  if  one  is  merely  interested  in  making  progress 
towards  the  goal,  as  opposed  to  making  progress  towards  the  origin.  If  one  wanted  to 
take  this  added  information  into  account  then  using  d  =  —  r2  is  appropriate  (see 

figure  2.9).  In  either  case,  if  a  sensed  value  x*  appears  outside  of  the  circle  of  radius 
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Figure  2.9:  ea  is  the  sensing  uncertainty  and  r  is  the  goal  radius,  d  is  the  minimum 
distance  form  the  origin  that  a  sensed  value  must  lie  in  order  to  guarantee  progress 
towards  the  goal.  If  velocity  control  is  perfect,  taking  d  —  e,  is  sufficient,  but  this 
figure  shows  that  a  smaller  value  of  d  is  often  possible. 


d,  then  commanding  a  velocity  in  the  direction  — x’  is  guaranteed  to  move  all  possible 
interpretations  of  x',  that  is  all  points  in  the  region  d?(,(x*)  —  G,  closer  towards  the 
goal  G.  Furthermore,  one  can  move  in  the  direction  — x*  for  a  total  duration  that 
changes  distance  by  less  than  2  (|x*|  —  d),  and  still  be  sure  that  progress  towards  the 
goal  has  been  made,  independent  of  the  actual  location  x  £  B(>(x *)  —  G. 

Now  consider  shifting  the  circle  of  radius  d  by  — b.  Denote  the  disk  circumscribed 
by  this  circle  by  D.  In  the  context  of  this  special  example,  this  disk  represents  the 
range  of  actual  positions  for  which  the  returned  sensor  readings  lie  within  distance 
d  of  the  origin.  Thus  the  disk  consists  of  those  locations  of  the  system  for  which 
the  simple  feedback  strategy  cannot  be  sure  of  making  progress  towards  the  goal. 
(Recall,  that  the  system  knows  ima*  but  not  b.)  Observe,  that  if  d  =  Jtj  —  r2  and 
b  =  br nax  =  e,,  then  D  intersects  the  goal  at  the  same  points  at  which  the  boundary 
of  the  guaranteed  region  P  intersects  the  goal  circle.  If  d  is  larger  than  this,  or  b  is 
smaller,  then  the  disk  D  actually  overlaps  the  region  P.  Thus  there  are  three  regions 
that  characterize  the  behavior  of  this  simple  feedback  strategy:  (1)  The  region  D , 
in  which  the  strategy  cannot  guarantee  progress,  (2)  the  region  P  (or  some  subset 
thereof  if  D  overlaps  P )  in  which  the  simple  feedback  strategy  can  both  guarantee 
progress  and  eventual  goal  convergence,  and  (3)  the  region  W  =  9?2  —  (G(J  P  (J  D ),  in 
which  the  strategy  can  guarantee  progress  locally  but  not  eventual  goal  attainment. 
For  this  example,  if  the  system  starts  off  in  W,  then  it  will  necessarily  enter  the  disk 
D,  simply  because  the  system  always  moves  towards  the  point  -b.  See  figure  2.10. 

The  region  D  corresponc  to  a  randomizing  region.  One  possibility  is  for  the 
system  to  randomly  jump  to  some  location  whenever  it  finds  itself  unable  to  make 
progress,  that  is,  whenever  the  sensor  returns  a  value  within  distance  d  from  the 
origin.  Equivalently,  the  system  could  just  move  in  a  randomly  chosen  direction  for 
some  duration  of  time.  These  motions  should  be  so  chosen  that  there  is  a  non-zero 
probability  of  entering  either  the  region  P  or  the  goal  G.  For  example,  it  may  be 
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Figure  2.10:  Range  of  positions  and  sensor  values  for  which  the  system  cannot  decide 
in  which  direction  to  move.  In  the  region  D  the  system  cannot  make  progress  towards 
the  goal.  From  the  region  P  goal  attainment  is  certain.  From  the  region  W  progress 
is  possible  but  not  immediate  goal  attainment. 
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possible  to  randomly  jump  to  some  area  A  surrounding  the  goal,  in  which  case  the 
probability  of  entering  the  region  P  is  just  the  ratio  of  the  areas,  that  is,  IPfl^l/l^l- 
A  typical  execution  trace  of  this  strategy  therefore  consists  of  a  series  of  straight-line 
motions  into  the  randomizing  disk,  each  of  which  is  followed  by  a  random  motion 
out  of  the  disk.  Eventually  one  of  these  randomizing  motions  enters  the  preimage  P, 
whereupon  entry  into  the  goal  is  guaranteed.  The  expected  time  until  success  is  on 
the  order  of  |Aj/|Pn^l  times  the  time  required  to  execute  a  random  motion.  This 
time  may  be  on  the  order  of  the  diameter  of  A. 

An  alternative  to  using  random  jumps  or  extended  random  motions  whenever  a 
sensed  value  does  not  permit  unambiguous  progress  towards  the  goal,  is  to  execute 
a  short  random  motion.  The  model  is  to  employ  a  simple  feedback  loop  in  which  all 
motions,  both  those  executed  deterministically  and  those  executed  randomly,  are  of  a 
fixed  short  duration.  This  view  of  randomization  follows  the  simple  guessing  strategy 
outlined  in  section  3.12.3.  In  the  current  context,  the  primitive  actions  are  simply 
motion  directions  executed  for  some  fixed  small  interval  of  time.  Guessing  between 
different  knowledge  states  entails  choosing  a  random  motion  direction.  A  simple 
feedback  strategy  that  does  not  retain  history  thus  does  not  have  the  capability 
of  executing  jumps  or  extended  motions.  Notice  that  this  type  of  strategy  has  a 
considerably  different  behavior  than  the  preceding  one.  In  particular,  if  the  system 
starts  outside  of  the  disk  D.  then  it  will  head  straight  for  the  point  — b,  either  attaining 
the  goal  directly  or  entering  the  disk  D.  Once  inside  the  disk  D,  the  system  will  stray 
about  randomly  in  that  disk.  Essentially,  the  boundary  of  the  disk  forms  a  barrier 
that  is  not  crossed.  This  is  because  as  soon  as  the  system  moves  back  out  into  region 
W,  it  will  encounter  a  sensed  value  that  permits  progress  towards  the  goal,  thus 
sending  the  system  right  back  into  the  disk.  Thus,  this  strategy  effectively  amounts 
to  a  random  walk  inside  the  disk  D.  The  random  walk  eventually  crosses  over  into 
the  goal  G.  whereupon  the  strategy  terminates  successfully.  The  expected  time  until 
success  is  on  the  order  of  the  non-goal  area  inside  the  disk,  that  is  | D  —  G|,  times 
perhaps  a  logarithmic  factor,  depending  on  the  location  of  the  goal.1 

We  thus  have  two  randomized  strategies,  of  apparently  different  character. 
Certainly  the  random  jumps  appear  to  be  of  significantly  different  character  than 
the  short  random  motions.  However,  one  can  view  a  random  jump  as  a  strategy  that 
randomly  guesses  the  current  state  of  the  system  then  executes  a  motion  designed 
to  attain  the  goal  assuming  the  guess  is  correct.  Similarly,  one  can  model  the 
extended  random  motions  as  sequences  of  actions  acting  over  short  periods  of  time. 
The  sequence  may  be  viewed  as  the  execution  of  a  strategy  with  history,  based  on 
a  randomly  selected  start  region.  In  this  manner,  these  randomizations  fit  nicely 
into  the  framework  developed  for  the  discrete  case  in  chapter  3.  In  summary,  one 
randomized  strategy  tries  to  escape  the  region  Z),  in  which  sensing  is  useless,  by 
randomly  moving  to  a  new  start  location,  while  the  other  strategy  tries  to  escape  this 
region  by  drifting  across  it  towards  the  goal.  The  first  may  be  viewed  as  randomization 


‘This  is  similar  to  the  expected  time  of  n2  log  n  required  to  attain  the  origin  on  a  two-dimensional 
n  x  n  grid.  See  [Montroll]. 
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with  history,  the  second  as  randomization  within  a  simple  feedback  loop. 

Deciding  which  strategy  to  execute  depends  very  much  on  the  capabilities  available 
to  the  system,  as  well  as  the  expected  times  of  success.  For  instance,  if  the  preimage 
P  is  large  compared  to  the  area  A  into  which  the  system  jumps  randomly  and  if 
the  goal  G  area  is  small  relative  to  the  disk  D ,  then  it  makes  sense  to  randomize  by 
jumping.  Otherwise,  it  may  make  sense  to  randomize  by  performing  a  random  walk. 

Au  observation  in  favor  of  the  random  walk  is  the  realization  that  for  more  general 
sensing  and  control  uncertainties,  there  may  not  be  a  region  P  from  which  entry  into 
the  goal  is  guaranteed.  In  particular,  the  region  of  useless  sensing  may  include  the 
goal.  This  might  happen  if  the  actual  bias  has  a  magnitude  considerably  less  than  the 
maximum  possible  magnitude.  In  that  case,  even  though  the  strategy  can  guarantee 
progress  towards  the  goal  whenever  the  system  is  far  enough  away,  eventually,  as  the 
system  approaches  the  goal,  sensing  becomes  useless,  and  guaranteed  progress  must 
give  wav  to  random  motions.  In  that  case,  both  random  jumps  and  random  walks 
succeed  only  by  actually  attaining  the  goal. 

What  is  interesting  about  this  example  is  that  both  these  randomized  strategies 
succeed  independent  of  the  actual  bias  b.  In  fact,  the  same  strategies  will  succeed 
independent  of  the  distribution  of  actual  sensor  values  in  the  ball  Bti(x).  The  speed 
of  convergence  of  course  depends  on  the  precise  distribution  but  the  existence  of  a 
solution  does  not.  With  slight  modifications  the  strategies  can  be  made  to  succeed  in 
the  presence  of  certain  forms  of  control  uncertainty  as  well. 

This  strategy  is  an  example  of  the  form  to  be  discussed  in  section  3.12.4.  In 
particular,  the  strategy  takes  advantage  of  the  lack  of  an  adversary  who  can  forever 
keep  the  system  from  attaining  the  goal.  This  is  evident  in  the  assumption  of  a 
constant  sensing  bias.  The  bias  plays  the  role  of  an  unmodelled  system  parameter  that 
cannot  assume  worst-case  values  at  every  location  in  state  space.  For  the  case  b  =  b^^ 
and  d  =  -  r,  this  assumption  ensures  that  for  some  approach  direction  there 

will  be  a  guaranteed  path  to  the  goal.  While  this  approach  direction  is  not  known 
to  the  system,  the  randomized  motions  ensure  that  it  will  be  discovered  eventually. 
More  generally,  there  may  not  be  a  region  of  guaranteed  success.  In  this  case,  the 
random  walk  ensures  that  the  goal  will  be  attained  eventually.  (N.B.:  Implicit  in  this 
strategy  is  the  assumption  that  there  is  no  adversary  who  can  bias  the  commanded 
motions  sufficiently  that  they  act  in  a  non-random  fashion,  driving  the  system  away 
from  the  goal.) 

We  will  analyze  the  random-walk  strategy  again  in  chapter  5,  and  augment 
the  strategy  to  account  for  control  uncertainty.  Further,  assuming  particularly 
nice  distributions  of  sensing  and  control  uncertainty,  we  will  compute  the  expected 
progress  at  each  point.  The  rest  of  this  chapter  will  focus  more  on  the  manner  in 
which  both  guaranteed  and  randomized  strategies  are  computed  in  continuous  cases. 
It  is  hoped  that  the  example  has  provided  a  flavor  of  the  approach. 
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Figure  2.11:  Given  perfect  sensing  and  control,  a  strategy  for  attaining  the  goal  is 
simply  a  path  to  the  goal. 


2.5  Simple  Feedback  Loops 

The  main  focus  of  this  thesis  is  to  develop  an  understanding  of  randomized  strategies. 
This  will  be  done  both  in  the  setting  of  full  history  and  in  the  setting  of  simple 
feedback  loops.  Section  2.3.4  (page  73)  explained  the  basic  approach  for  planning 
randomized  strategies  that  use  full  history,  with  further  details  appearing  in  chapter 
3.  This  section  is  devoted  to  a  quick  overview  of  simple  feedback  loops  with 
randomization.  These  were  discussed  :u  section  2.3.3.  The  region-attaining  example 
of  section  2.4  made  use  of  a  simple  feedback  loop.  The  basic  structure  of  a  simple 
feedback  loop  is  well  described  by  that  example.  In  particular,  a  randomized  simple 
feedback  loop  executes  actions  designed  to  make  progress  towards  a  goal  when  this 
is  possible,  and  otherwise  executes  a  random  motion.  The  simple  feedback  loop 
only  consults  current  sensed  values  in  making  its  decisions.  Again,  chapters  3  and  5 
examine  feedback  loops  in  greater  detail. 

2.5.1  Feedback  and  Uncertainty 

Feedback  in  a  Perfect  World 

The  example  of  section  2.4  provided  some  of  the  motivation  and  the  basic  approach. 
Let  us  now  develop  these  i^eas  slightly  further,  as  a  prelude  to  chapter  3.  Consider 
first  the  setting  of  perfect  „  „rol  and  perfect  sensing.  In  such  a  perfect  world  a 
strategy  for  attaining  a  goal  might  consist  of  a  series  of  paths  that  lead  from  anv 
initial  state  to  the  goal.  See  figure  2.11.  One  might  for  instance  take  the  paths  to 
be  the  shortest  paths  to  the  goal.  Sensing  is  not  really  required  except  perhaps  to 
determine  the  starting  location  of  the  system. 

Feedback  with  Imperfect  Control 

As  one  relaxes  the  assumption  of  perfect  control,  sensing  becomes  useful  for  correcting 
errors  introduced  during  a  motion.  Again,  a  planner  may  specify  a  strategy  that 
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Figure  2.12:  This  figure  shows  a  snapshot  of  a  feedback  strategy  in  which  control  is 
imperfect  but  sensing  is  perfect.  At  each  instant  the  system  determines  a  path  to  the 
goal  from  the  current  state. 


consists  of  a  collection  of  paths  that  lead  to  the  goal.  Sensing  is  used  at  run-time 
to  determine  which  path  the  system  is  actually  on  at  any  instant.  See  figure  2.12. 
One  now  has  a  true  feedback  strategy.  At  each  instant  of  time  the  sensed  state  of 
the  system  is  used  to  decide  on  a  proper  course  of  action.  The  feedback  strategy  is  a 
simple  feedback  strategy  since  it  does  not  make  use  of  past  sensed  values. 

Observe  that  we  have  said  nothing  about  how  one  actually  comes  by  the  paths  that 
lead  to  the  goal.  In  the  perfect-world  case  these  might  come  from  a  standard  motion 
planner,  or  perhaps  a  shortest-path  planner.  In  the  perfect-sensing/imperfect-control 
wond.  one  can  use  these  same  paths.  In  other  words,  the  strategies  determined  for  the 
perfect  world  may  be  used  as  nominal  plans  in  the  imperfect  world.  While  it  is  true 
that  one  might  be  able  to  optimize  the  time  to  attain  the  goal  by  explicitly  replanning, 
using  for  instance  dynamic  programming,  this  is  not  generally  required  merely  to 
obtain  a  solution.  Under  simple  bounds  on  the  extent  of  the  conirol  uncertainty,  and 
simple  conditions  on  the  paths,  these  nominal  plans  suffice  to  guarantee  attainment 
of  the  goal.  The  conditions  may  be  summarized  by  saying  that  the  nominal  paths 
should  form  a  progress  measure  and  that  the  control  uncertainty  should  be  small 
enough  so  that  progress  is  possible  at  any  state  of  the  system.  By  a  progress  measure 
we  essentially  mean  a  scalar  function  that  is  continuous  over  the  state  space  and  that 
is  reduced  as  one  moves  along  any  given  path.  Distance  from  the  goal  is  one  such 
measure.  See  also  the  work  by  [Khatibj  on  potential  functions. 

Feedback  with  Imperfect  Control  and  Imperfect  Sensing 

Finally,  let  us  relax  the  assumption  of  perfect  sensing.  We  would  like  to  extend  the 
feedback  approach  outlined  above.  In  particular,  we  would  like  to  begin  with  a  set  of 
nominal  paths  or  plans  that  lead  from  any  location  to  the  goal.  The  nominal  paths 
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serve  as  a  guide.  At  run-time  the  system  repeatedly  uses  sensing  to  determine  its 
actual  location  on  one  of  these  paths,  thereby  compensating  for  errors  introduced 
by  control  uncertainty.  This  is  a  classic  view  of  feedback.  However,  the  presence 
of  sensing  uncertainty  severely  complicates  the  picture.  The  system  now  cannot 
ascertain  precisely  on  which  path  it  is  located.  Instead,  there  may  be  a  collection  of 
paths  that  are  candidates  for  guiding  the  system  to  the  goal.  This  collection  is  given 
by  all  paths  that  intersect  the  sensory  interpretation  set.  So  long  as  all  these  paths 
point  in  essentially  the  same  direction,  the  system  can  find  a  motion  direction  which  is 
guaranteed  to  make  progress  relative  to  the  paths.  However,  it  may  easily  be  the  case 
that  some  paths  point  in  conflicting  directions,  so  that  the  system  cannot  ensure  that 
it  will  reduce  its  distance  to  the  goal.  This  was  the  gist  of  the  example  of  section  2.4. 
At  this  point  randomization  enters  into  the  picture.  If  the  system  cannot  guarantee 
progress  relative  to  the  nominal  paths,  then  it  should  simply  execute  ?  randomizing 
motion.  This  ensures  that  there  is  at  least  a  possibility  of  making  progress,  no  matter 
where  the  actual  location  of  the  system  i?  within  the  sensing  uncertainty  ball. 

In  short,  we  will  think  of  a  simple  feedback  loop  as  a  feedback  strategy  that 
uses  a  progress  measure  to  move  towards  the  goal.  The  run-time  knowledge  state 
of  the  system  is  just  its  current  sensory  interpretation  set.  Whenever  progress  is 
possible  for  all  states  of  the  system  within  this  knowledge  state,  the  system  executes 
a  motion  to  make  progress.  Otherwise,  the  system  executes  a  randomizing  motion. 
Randomization  is  required  to  ensure  ultimate  goal  attainment.  This  type  of  a 
randomized  strategy  is  perhaps  the  simplest  sensor-based  strategy  imaginable.  It  is  a 
natural  generalization  of  the  feedback  strategies  used  with  perfect  sensing.  Strategies 
that  employ  history  in  making  decisions  are  conceptually  built  on  top  of  these  simple 
strategies.  In  particular,  randomization  serves  essentially  the  same  role  in  all  of  these 
strategies,  namely  as  a  device  to  continue  operation  even  when  decisions  cannot  be 
made  with  certainty.  It  is  merely  that  with  the  history-based  strategies  the  effective 
state  of  the  system  is  complicated  by  the  influence  of  past  information. 


2.5.2  Progress  in  Feedback  Loops 

The  Feedback  Loop 

The  basic  structure  of  a  randomized  simple  feedback  loop  is  given  by  the  following 
pseudo-routine.  The  routine  assumes  that  there  is  a  non-negative  scalar  progress 
measure  f(x).  defined  at  each  point  of  the  state  space,  that  is  zero  at  the  goal.  The 
function  f  is  often  referred  to  as  a  labelling  in  the  rest  of  the  thesis.  In  general, 
additional  conditions  may  need  to  be  imposed  on  (,  such  as  continuity,  and  the 
absence  of  local  minima.  Recall  also  that  Fa{x)  is  the  set  of  all  states  to  which  x 
might  move  under  action  .4. 
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REPEAT  until  goal  attainment: 

Sense  x*. 

Let  /(x*)  be  the  possible  locations  of  the  system. 

FOR  all  actions  A  do- 

For  x  6  /(x*),  let  A(x)  =  maxyeFyl(r)  {i{y)}  -  i{x). 

If  A(x)  <  0  for  all  x  6  J(x*), 

then  execute  action  A  and  exit  from  the  FOR  loop. 
End  .for 

If  no  action  A  was  executed, 

then  randomly  select  an  action  to  execute. 

End-repeat _ 


Pseudo-code  describing  a  simple  feedback  loop. 

The  inner  FOR  loop  checks  whether  it  is  possible  to  make  progress  relative  to 
the  progress  measure.  If  this  is  not  possible,  then  a  random  action  is  executed.  This 
feedback  loop  assumes  that  goal  attainment  is  recognizable  upon  entry  into  the  goal. 

Velocity  of  Approach 

The  synthesis  of  these  feedback  loops  is  trivial  assuming  that  a  progress  measure  is 
given.  Let  us  therefore  turn  to  an  analysis  of  such  loops.  The  key  issue  is  deciding 
how  fast  progress  is  made  towards  the  goal.  Thus  it  is  useful  to  define  the  velocity 
of  approach  at  each  state  of  the  system.  Intuitively,  we  would  like  the  velocity  vr  to 
measure  the  rate  at  which  progress  is  made  whenever  the  system  is  in  state  x.  We 
must  be  careful  to  define  this  quantity  in  a  meaningful  manner.  The  proper  definition 
depends  very  much  on  the  types  of  sensing  uncertainty  and  control  uncertainty  that 
are  in  effect. 

In  a  world  with  perfect  control  and  perfect  sensing,  the  velocity  of  approach  is  just 
the  change  in  the  progress  measure,  measured  along  the  path  to  the  goal.  The  velocity 
is  negative  whenever  progress  is  being  made.  This  velocity  has  a  useful  property.  In 
particular,  one  can  integrate  the  quantity  —l/vx  over  a  path  to  the  goal  in  order  to 
obtain  the  time  required  to  attain  the  goal.  This  means  that  if  for  some  v  the  velocity 
at  each  state  x  satisfies  vT  <  v  <  0,  then  the  time  to  attain  the  goal  is  bounded  by 
-d/v,  where  d  is  the  maximum  starting  distance  from  the  goal.  We  would  like  our 
more  general  definition  to  possess  this  same  property. 

Much  of  the  material  in  sections  3.4,  3.5,  and  3.6  is  concerned  with  defining 
velocity  properly  and  establishing  the  bounding  property  just  mentioned.  There  is 
a  considerable  difference  between  the  probabilistic  setting  and  the  non-deterministic 
setting. 
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In  the  non-deterministic  setting  the  natural  definition  of  the  velocity  vx  is  as  the 
worst-case  bound  on  the  change  in  the  progress  measure  whenever  the  system  is  in 
state  x.  In  particular,  the  velocity  at  a  state  x  is  of  the  form: 


vx 


max  max  —  iix 

applicable  y£FA  (r) 
action*  A 


M 

Hi 


where  i  is  the  progress  measure  as  before.  In  order  for  this  velocity  to  be  negative, 
each  of  the  terms  inside  the  maximization  must  be  negative.  This  says  that  the 
feedback  loop  is  effectively  a  guaranteed  strategy  for  attaining  the  goal.  Given  that 
the  progress  measure  I  is  based  on  a  collection  of  nominal  plans  developed  for  a 
perfect  world,  one  cannot  actually  expect  that  the  velocities  {ux}  will  all  be  negative. 
This  suggests  that  the  natural  setting  for  simple  feedback  loops  is  in  the  probabilistic 
domain,  rather  that  in  the  non-deterministic  domain.  Indeed  in  the  probabilistic 
domain  the  definition  of  velocity  leads  to  some  interesting  issues. 


Random  Walks 

The  natural  domain  for  exploring  simple  feedback  loops  with  probabilistic  uncertainty 
is  in  the  setting  of  Markov  chains  and  their  continuous  counterparts.  This  is  because 
for  each  state  of  the  system,  the  simple  feedback  loop  described  above  defines  a  range 
of  probabilistic  transitions.  Each  transition  is  the  result  of  some  action  that  the  simple 
feedback  loop  might  execute.  An  action  is  executed  either  as  a  result  of  obtaining  a 
sensory  value  that  permits  making  progress,  or  as  a  result  of  randomly  selecting  an 
action.  Since  sensing  and  control  uncertainty  are  probabilistic,  the  net  result  is  a  set 
of  probabilistic  transitions. 

As  an  example,  consider  again  a  two-dimensional  peg-in-hole  task  for  which  the 
peg  is  in  contact  with  a  horizontal  edge  near  the  hole.  Suppose  that  we  have 
discretized  the  state  space,  as  indicated  in  figure  2.13.  In  a  perfect  world,  once 
the  peg  is  in  contact  with  a  horizontal  edge,  a  plan  for  attaining  the  goal  consists 
of  moving  left  if  the  peg  is  to  the  right  of  the  hole,  and  moving  right  if  the  peg  is 
to  the  left  of  the  hole.  There  are  thus  two  nominal  paths  for  moving  towards  the 
goal.  Said  differently,  a  progress  measure  is  given  by  the  system’s  distance  from  the 
goal.  Let  us  ignore  the  issue  of  control  uncertainty  and  instead  assume  simply  that 
the  peg's  motions  consist  of  moving  to  neighbor  states  in  the  discrete  representation 
of  its  state  space.  Now  let  us  instantiate  the  simple  feedback  loop  for  this  problem 
in  the  presence  of  sensing  uncertainty.  The  feedback  loop  is  based  on  the  distance 
progress  measure.  2 


2 We  should  note  in  passing  that  the  strategy  is  slightly  silly,  given  the  low-dimensionality  of  the 
state  space.  However,  it  is  a  convenient  example  for  illustrating  the  construction  and  character  of  a 
simple  feedback  loop  A  more  complicated  example  was  considered  in  section  2.4 
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Figure  2.13:  Discrete  approximation  of  the  horizontal  state  space  of  a  peg-in-hole 
problem.  State  “0”  corresponds  to  the  goal. 


1.  Sense  the  current  horizontal  position. 

2.  Decide  on  a  direction  in  which  to  move: 

(a)  If  the  sensed  value  unambiguously  determines  the  peg’s 
position  to  be  to  the  left  of  the  hole,  then  decide  to  move 
right. 

(b)  If  the  sensed  value  unambiguously  determines  the  peg’s 
position  to  be  to  the  right  of  the  hole,  then  decide  to  move 
left. 

(c)  Otherwise,  randomly  pick  left  or  right. 

3.  Move  one  step  in  the  direction  selected  by  the  previous  step,  while 

simultaneously  pushing  down  slightly. 

4.  Repeat  steps  1  through  3  until  the  goal  is  achieved. 


A  simple  feedback  loop  for  inserting  the  peg  of  figure  2.13. 

Let  us  analyze  this  strategy.  Suppose  that  the  sensor  is  symmetric.  Then  it 
suffices  to  consider  the  distance  of  the  peg  from  the  origin.  Denote  by  a  the  distance 
of  the  peg's  reference  point  from  the  origin.  Let  pa  be  the  probability  that  the  sensor 
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Figure  2.14:  A  Markov  chain  model  for  the  discrete  peg-in-hole  problem  of  figure 
2.13. 


will  return  an  unambiguous  reading  when  the  peg  is  located  at  distance  a  from  the 
hole.  By  an  unambiguous  sensor  reading  we  mean  a  sensed  value  x’  all  of  whose 
interpretations  I(xm)  lie  either  completely  to  the  left  or  completely  to  the  right  of  the 
hole.  Then  the  probability  of  moving  towards  the  hole  is 

,  l  ,  ,11 

Phoi'(a)  =  Pa  +  -(1  -  Pa)  =  -  +  -  Pa- 

Figure  2.14  shows  the  resulting  system,  modelled  as  a  simple  Markov  chain.  [Here 
p( i )  is  shorthand  for  ph0;e(i),  and  q(i)  —  1  —  p{i).] 

The  precise  value  of  pa  and  thus  of  p/i0/c(a)  depends  on  the  sensor,  of  course. 
Observe,  however,  that  p/,o(e(a)  >1/2  whenever  pa  >  0.  In  short,  there  is  a  natural 
drift  towards  the  origin.  Indeed,  the  expected  change  in  the  distance  from  the  origin 
is  given  by: 


Aa  =  (-l)pfc0je(a)  +  (+1)(1  ~Phou{a)) 

=  — 2pfc„/e(a)  +  l 
=  -Pa¬ 
in  other  words,  on  average,  the  system  decreases  its  distance  from  the  goal  by  p0 
per  step.  It  thus  makes  sense  to  define  the  velocity  at  the  point  a  to  be  va  =  —pa. 

We  see  in  this  example  one  of  the  key  issues  that  arises  in  the  analysis  of 
randomized  strategies,  in  particular,  of  simple  feedback  loops.  This  is  the  question  of 
whether  sensing  is  strong  enough  to  pull  the  system  towards  the  goal  on  average.  In 
this  one-dimensional  example  we  see  that  the  natural  drift  is  indeed  towards  the  goal 
everywhere.  In  more  complicated  spaces  this  need  not  always  be  the  case.  Part 
of  chapter  5  is  devoted  towards  analyzing  one  such  example,  based  on  the  two- 
dimensional  problem  of  section  2.4.  We  will  see  that  for  nicely  behaved  sensing 
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and  velocity  errors,  there  is  an  unbounded  annulus  about  the  origin  within  which  the 
system  moves  towards  the  origin  on  average.  However,  once  the  system  lies  within  a 
certain  distance  of  the  origin,  the  sensing  information  becomes  less  useful.  Instead, 
the  randomizing  actions  tend  to  push  the  system  outward.  Although  eventually  the 
system  will  approach  arbitrarily  closely  to  the  origin,  the  natural  drift  is  away  from 
the  origin  on  the  average.  This  places  a  lower  bound  on  the  size  of  the  goal  region 
required  to  ensure  fast  convergence. 

More  generally,  one  can  define  the  expected  velocity  at  a  state  to  be  the  expected 
change  in  the  progress  measure.  A  considerable  portion  of  chapter  3  is  devoted  to 
proving  that  this  definition  of  velocity  in  the  probabilistic  setting  has  many  of  the 
same  properties  as  does  the  usual  notion  of  velocity  in  a  deterministic  world.  In 
particular  if  the  expected  velocity  at  every  state  is  bounded  from  above  by  some 
number  v  <  0,  then  the  expected  time  to  attain  the  goal  is  bounded  from  above  by 
—d/v.  where  d  is  the  maximum  starting  distance  from  the  goal. 

An  attractive  aspect  of  the  probabilistic  definition  of  velocity  is  that  it  captures 
the  notion  of  progress  on  the  average.  In  order  to  converge  to  a  goal  rapidly  a  strategy 
thus  need  not  make  progress  at  every  instant  in  time,  so  long  as  it  makes  progress  on 
the  average.  This  is  a  considerably  more  flexible  definition  than  what  is  available  in  a 
non-deterministic  world.  This  is  because  in  a  non-deterministic  world  all  constraints 
are  formulated  in  terms  of  worst-case  behavior.  One  desirable  trait  of  randomization 
in  general  is  that  it  permits  one  to  mix  the  notions  of  worst-case  and  average-case 
behaviors.  Thus  even  in  an  adversarial  world  one  can  sometimes  gain  an  advantage  by 
purposefully  randomizing  one’s  actions.  This  is  the  idea  put  forth  in  section  2.4.  Even 
though  one  may  not  be  able  to  ensure  progress  on  any  given  attempt,  by  randomizing 
one  can  at  least  ensure  progress  eventually,  and  in  some  cases,  one  can  ensure  progress 
on  the  average. 


2.6  Strategies  Revisited 

We  saw  in  section  2.2  that  there  are  essentially  four  dimensions  that  define  the  types 
of  tasks  that  arise  in  robot  motion  planning  with  uncertainty.  It  is  easy  to  confuse 
the  methods  for  these  different  problems,  so  let  us  recall  the  four  dimensions  briefly. 

One  dimension  corresponds  to  the  level  of  uncertainty  in  the  actions. 
The  categories  of  action  uncertainty  that  we  discussed  were:  DETERMINISTIC, 
Probabilistic,  Partially  Adversarial,  and  Non-Deterministic.  a  second 
dimension  corresponds  to  the  level  of  uncertainty  in  sensing.  The  categories  of 
sensing  uncertainty  that  we  discussed  were:  PERFECT,  PROBABILISTIC,  PARTIALLY 

Adversarial,  Non-Deterministic,  Nearly-Sensorless,  and  Sensorless.  A 
third  dimension  coi responds  to  the  type  of  strategy  used  to  solve  the  task.  The 
two  categories  that  we  discussed  were  GUARANTEED  and  RANDOMIZED.  Finally, 
the  fourth  dimension  corresponds  to  the  amount  of  history  used  by  these  strategies 
in  making  their  decisions.  The  two  extremes  that  we  discussed  were  given  by 
Full  History  and  Simple  Feedback.  In  some  sense  there  is  a  fifth  dimension, 
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corresponding  to  the  type  of  state  space,  but  we  will  ignore  this  dimension  in  the 
current  categorization  since  most  of  the  results  generalize  from  the  discrete  case  to 
the  continuous  setting. 

Focusing  for  the  moment  on  the  two  dimensions  of  strategy  type  and  history  usage, 
the  following  table  describes  the  contribution  of  this  thesis. 


Strategy  Type 


Guaranteed 

Randomized 

History 

None 

Full 

LMT 
LMT;  DP 

Thesis 

Thesis 

Focus  of  the  thesis. 

The  entry  “LMT;  DP”  refers  to  the  work  by  [LMT]  on  preimages  and  the  general 
dynamic  programming  approach  for  planning  guaranteed  or  optimal  strategies.  See 
chapter  4  for  a  discussion  of  preimages  in  the  continuous  domain  and  chapter  3  for  a 
discussion  of  dynamic  programming  in  the  discrete  domain. 

This  thesis  does  not  discuss  much  the  synthesis  of  guaranteed  strategies  that  use 
no  history.  In  general,  simple  feedback  loops  are  best  thought  of  in  the  probabilistic 
or  randomized  domains,  since  they  are  generally  not  guaranteed  to  converge  in  a 
predetermined  number  of  steps.  However,  some  work  has  been  done  in  this  area  in 
the  context  of  robot  motion  planning.  Clearly,  guaranteed  strategies  that  use  no 
history  may  be  viewed  as  a  special  case  of  preimage  planning  [LMT].  Other  special 
cases  and  extensions  are  discussed  in  [Erd84],  [Buc],  and  [Don89],  among  others. 

Turning  to  the  dimensions  of  control  and  sensing  uncertainty,  the  following  table 
describes  the  the  types  of  tasks  considered  either  directly  or  indirectly  by  this  thesis. 
Essentially,  the  natural  approach  is  to  pair  up  non-deterministic  control  with  non- 
deterministic  sensing,  and  probabilistic  control  with  probabilistic  sensing.  Entries 
with  a  refer  to  task  specifications  that  are  special  cases  of  either  the  general 
preimage  framework  or  the  material  discussed  in  this  thesis.  Entries  that  specify 
section  or  chapter  numbers  refer  to  material  treated  in  detail  in  the  thesis. 


Control  (Action)  Uncertainty 


Perfect 

Probabilistic 

Non- Deterministic 

Sensing 

Uncertainty 

Perfect 

Probabilistic 

Partially  Adversarial 
Non-Deterministic 
Near-Sensorless 
Sensorless 

V 

V 

§2.4 

v/ 

x/ 

V 

V 

§3.4;  §3.5;  §5 
§5 

%/ 

v/ 

V 

§2.4  ;  §3.12.4 
§3.6— §3. 1 1 ;  §4 
§3.13 
§3.13 

Descriptions  of  tasks  considered  by  this  thesis. 

One  issue  that  these  tables  do  not  highlight  is  the  relationship  of  non-deterministic 
models  to  probabilistic  models.  In  some  cases  the  world  may  behave  probabilistically, 
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even  though  the  model  is  non-deterministic.  Section  5.2  treats  this  topic  briefly. 
The  topic  arises  naturally  in  the  analysis  of  strategies  formulated  in  terms  of  the 
non-deterministic  model.  A  guaranteed  or  randomized  strategy  that  assumes  a  non- 
deterministic  description  of  uncertainty  is  certain  to  succeed  independent  of  the  actual 
instantiation  of  errors.  However,  in  ord“r  to  perform  a  specific  rather  than  a  worst- 
case  performance  analysis,  it  is  often  useful  to  assume  a  particular  instantiation  of 
the  sensing  and  control  errors,  such  as  assuming  some  probabilistic  model.  For  those 
cases  it  is  important  to  understand  the  relationship  between  the  worst-case  model  and 
the  probabilistic  model.  Indeed,  most  of  chapter  5  is  concerned  with  the  analysis  of  a 
simple  randomized  strategy,  modelled  after  the  example  of  section  2.4.  The  strategy 
is  general  enough  to  succeed  under  a  variety  of  worst-case  scenarios.  In  order  to  gain 
some  appreciation  for  the  behavior  of  the  strategy,  however,  it  is  useful  to  assume  a 
pair  of  idealized  probabilistic  distributions  describing  the  sensing  and  control  errors. 


2.7  Summary 

This  chapter  has  briefly  outlined  the  basic  focus  of  the  thesis.  The  chapter  defined 
different  types  of  uncertainty,  and  different  approaches  for  planning  strategies  that 
solve  tasks  in  the  presence  of  uncertainty.  The  focus  of  the  thesis  is  on  randomized 
strategies,  with  a  particular  emphasis  on  simple  feedback  loops.  A  simple  feedback 
loop  only  considers  current  sensory  information  in  deciding  on  a  course  of  action. 
Randomized  simple  feedback  loops  expect  as  input  a  progress  measure,  perhaps  in  the 
form  of  a  nominal  plan  for  attaining  the  goal.  The  randomized  feedback  loop  attempts 
at  each  instant  to  move  in  a  manner  that  makes  progress.  If  this  is  not  possible,  then 
the  system  makes  a  random  motion.  The  chapter  included  an  example  consisting  of 
a  randomized  strategy  for  pushing  a  peg  on  a  surface  into  a  two-dimensional  hole. 

More  generally,  randomization  is  useful  because  it  permits  solutions  to  tasks  for 
which  there  are  no  guaranteed  solutions,  because  it  simplifies  the  planning  process, 
and  because  it  reduces  brittleness.  Brittleness  is  reduced  because  randomization 
can  blur  the  significance  of  environmental  details.  Rather  than  requiring  a  detailed 
analysis  of  an  environment,  a  system  can  instead  rely  on  randomization  to  effectively 
ignore  details  below  a  certain  scale. 


CHAPTER  2.  THESIS  OVERVIEW  AND  TECHNICAL  TOOLS 


Chapter  3 


Randomization  in  Discrete  Spaces 


This  chapter  examines  the  role  of  randomized  strategies  in  the  solution  of  tasks  that 
may  be  represented  by  a  set  of  discrete  states  and  actions.  The  chapter  will  also 
indicate  how  to  plan  strategies,  with  an  emphasis  on  finding  strategies  that  may  be 
planned  and  executed  quickly.  In  particular,  it  will  be  shown  that  there  are  some 
tasks  for  which  randomized  solutions  execute  more  quickly  on  the  average  than  do 
guaranteed  solutions  in  the  worst  case.  In  general,  of  course,  a  given  task  may  not 
have  a  guaranteed  solution,  but  we  will  see  that  under  very  simple  conditions  there 
is  always  a  randomized  solution  to  a  task  specified  on  a  discrete  space.  However,  the 
expected  execution  time  may  be  very  high. 


3.1  Chapter  Overview 

This  first  section  provides  a  brief  guide  to  the  organization  of  this  chapter. 


Basic  Definitions 

The  first  main  section  (§3.2)  presents  a  more  detailed  version  of  the  basic  definitions 
of  chapter  2,  specialized  to  tasks  on  discrete  spaces.  The  section  begins  with  the 
definition  of  tasks  in  the  non-deterministic  setting,  then  moves  on  to  the  probabilistic 
domain.  Next  the  section  considers  the  problem  of  planning  guaranteed  or  optimal 
strategies  in  the  probabilistic  setting.  In  particular,  the  Dynamic  Programming 
Approach  is  reviewed.  This  planning  approach  applies  with  slight  variations  to 
the  non-deterministic  setting  as  well.  Finally,  the  section  ends  with  some  technical 
subsections  that  elaborate  on  the  definition  of  knowledge  states  and  a  connectivity 
assumption.  Knowledge  states  reflect  the  uncertainty  with  which  a  system  knows  its 
location  at  run-time.  The  connectivity  assumption  rules  out  consideration  of  tasks  in 
which  massive  failure  can  occur. 
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Random  Walks 

As  we  noted  in  chapter  2,  random  walks  form  one  of  the  most  basic  type  of  randomized 
strategies.  In  particular,  the  results  developed  in  the  context  of  random  walks  are 
basic  to  the  understanding  of  simple  feedback  loops.  Section  3.4  considers  random 
walks,  and  section  3.5  introduces  the  notion  of  expected  progress.  This  second  section 
defines  the  expected  velocity  at  a  state  relative  to  a  labelling  of  the  state  space.  The 
section  proceeds  to  show  that  this  notion  of  an  expected  velocity  possesses  some  of  the 
standard  properties  of  a  deterministic  velocity.  In  particular,  if  the  expected  velocity 
at  all  states  points  towards  the  goal  and  is  uniformly  bounded  away  from  zero,  then 
an  upper  bound  for  the  time  to  attain  the  goal  is  given  by  the  distance  from  the  goal 
divided  by  the  velocity  bound. 

Planning  with  Randomization 

Sections  3.6  through  3.11  consider  the  general  problem  of  planning  strategies 
that  purposefully  randomize.  This  planning  approach  is  built  on  the  dynamic 
programming  approach  used  for  generating  guaranteed  strategies. 

Extensions  and  Specializations 

The  remaining  sections  discuss  various  extensions  and  specializations  of  randomized 
strategies.  Of  particular  interest  are  near-sensorless  tasks.  In  these  tasks  the  system 
must  rely  almost  entirely  on  its  predictive  ability  to  attain  a  goal.  The  only  sensing 
information  available  is  whether  or  not  the  goal  has  been  attained.  By  including  this 
one  bit  of  information  it  is  possible  to  develop  randomized  strategies  structured  as 
loops  that  repeatedly  attempt  to  attain  the  goal. 


3.2  Basic  Definitions 

This  section  presents  the  basic  definitions  of  actions,  sensors,  and  tasks  on  discrete 
spaces.  Section  2.2  already  explained  some  of  these  concepts.  The  current  section 
elaborates  on  more  of  the  technical  details.  The  presentation  of  these  definitions  is 
in  the  context  of  both  non-deterministic  and  probabilistic  actions  and  sensors.  The 
basic  approach  is  the  same  for  both  types  of  uncertainty.  Subtle  differences  between 
the  non-deterministic  and  probabilistic  cases  are  mentioned  as  necessary. 

3.2.1  Discrete  Tasks 

We  should  convince  ourselves  that  there  are  tasks  that  may  be  represented  in  discrete 
terms.  Recall  that  some  examples  were  given  in  chapter  2,  in  particular  in  section 
2.2.1.  A  typical  such  task  is  given  by  the  stable  configurations  under  gravity  of  a 
polyhedral  object  resting  on  a  planar  surface.  Indeed,  if  one  drops  a  polyhedral 
object  onto  a  horizontal  table  under  the  influence  of  gravity,  with  probability  one  it 
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will  come  to  rest  on  one  of  the  faces  comprising  the  convex  hull  of  the  object.  There  are 
finitely  many  such  faces.  Thus,  although  the  natural  configuration  space  of  the  object 
is  a  six-dimensional  space  consisting  of  the  three  translational  and  three  rotational 
degrees  of  freedom  of  the  object,  if  the  task  only  requires  examination  of  the  object’s 
stable  resting  configurations,  then  the  induced  state  space  is  finite.  Determiuing  the 
transitions  between  these  stable  states  may  require  a  dynamical  analysis  in  the  full 
six-dimensional  (or  higher)  state  space  of  the  object,  but  once  that  analysis  has  been 
performed,  the  planning  of  operations  can  occur  in  the  finite  and  discrete  state  space. 

Even  though  the  state  space  may  be  discrete  it  may  not  be  immediately  apparent 
that  the  set  of  transitions  between  the  states  is  finite.  Although  there  actually  may 
be  a  continuum  of  actions,  im  many  cases  there  is  a  natural  partitioning  of  this 
continuum  into  a  finite  collection  of  equivalence  classes,  where  each  action  in  an 
equivalence  class  has  the  same  effect  in  terms  of  the  transitions  on  the  underlying 
state  space.  For  instance,  if  we  are  interested  in  the  stable  resting  configurations  of 
an  object  on  a  table,  we  may  alter  those  resting  configurations  by  exerting  a  force 
on  the  object  through  its  center  of  mass.  In  that  case,  we  can  partition  the  space  of 
forces  into  regions  whose  qualitative  behavior  differs  across  regions  but  is  identical 
within  a  region.  For  instance,  forces  that  point  into  the  friction  cone,  thus  causing  no 
motion,  constitute  one  region.  Other  regions  might  include  those  forces  that  cause 
sliding,  and  those  that  cause  the  object  to  flip  from  one  stable  configuration  to  one 
or  more  other  stable  configurations. 

The  representation  of  tasks  is  a  difficult  issue.  In  some  cases,  problems  that 
appear  to  reside  in  a  continuum  state  space,  may  be  transformed  into  equivalent  or 
similar  problems  that  reside  in  finite  state  spaces.  The  details  of  the  transformation 
tend  to  be  task-specific,  although  often  stability  under  some  set  of  actions  may  be 
used  as  a  criterion  in  defining  the  discrete  states.  The  work  of  Brost  ([Brost85]  and 
[Brost86])  involves  such  a  transformation  for  the  problem  of  pushing  and  grasping 
planar  polygonal  objects.  Mani  and  Wilson  [MW]  used  a  similar  transformation 
in  their  work  on  pushing,  and  Erdmann  and  Mason  [EM]  employed  a  stable-under¬ 
gravity  transformation  in  their  work  on  orienting  planar  parts  in  a  tray. 

A  slightly  different  type  of  transformation  is  given  by  the  examples  of  gear-meshing 
and  object-sieving  cited  in  Chapter  1.  Here  in  some  sense  there  are  two  states, 
namely  SUCCESS  and  FAILURE.  A  complicated  higher-dimensional  analysis  was  used 
to  determine  the  effect  of  a  particular  action,  that  is,  to  compute  the  probability  of 
success  in  each  example.  However,  once  that  probability  had  been  computed,  the 
task  could  be  represented  by  a  discrete  state  space,  with  a  probabilistic  transition 
graph.  Certainly,  more  complex  graphs  can  be  envisioned,  especially  for  the  sieve- 
task.  in  which  one  could  imagine  a  series  of  sieves  arranged  vertically  above  each 
other.  In  that  case  a  natural  discrete  graph  is  given  by  states  corresponding  to  the 
regions  between  ihe  different  sieve  levels.  Assuming  that  one  does  indeed  randomize 
the  object’s  configuration  between  sieves,  there  is  no  need  to  accurately  model  this 
configuration,  and  it  becomes  sufficient  to  collapse  all  configurations  between  two 
sieves  into  a  single  state.  Of  course,  if  one  is  interested  in  synthesizing  strategies  by 
varying  the  possible  motions  through  the  sieves,  then  one  may  have  to  return  to  the 
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full  two-dimensional  continuum  configuration  space  of  the  part  being  moved.  Again, 
this  may  not  be  such  a  problem,  if  one  decides  to  limit  the  possible  sots  of  motions 
to  a  finite  class,  either  by  only  considering  finitely  many  or  by  partitioning  them  into 
equivalence  classes  relative  to  some  relation. 

3.2.2  Discrete  Representation 

This  section  provides  the  formal  representation  of  tasks  in  which  the  relevant  state 
space  and  action  set  are  discrete  and  finite.  The  development  will  assume  non- 
deterministic  actions  and  sensors.  More  specialized  actions  and  sensors,  such  as 
probabilistic  ones,  are  discussed  in  chapter  2.  Additionally,  sections  3.2.3  and  3.2.4 
discuss  probabilistic  actions,  sensing,  and  planning. 

States 

In  a  discrete  problem  we  are  given  a  finite  set  of  states  S  =  {s0,  Si .  s2,  •  •  • .  sn},  and  a 
finite  set  of  actions  A  ~  {.4t,  A2,  •  •  • ,  Am).  In  principle,  one  could  define  several  sets 
of  actions,  each  set  representing  the  actions  that  are  applicable  at  a  particular  state. 
However,  we  will  simply  assume  that  every  action  is  applicable  at  eveiv  state.  This 
is  an  unrestrictive  assumption  that  simplifies  the  notation  in  discussing  the  effects  of 
actions  when  the  current  state  is  unknown. 

Actions 

The  actions  are  non-deterministic,  that  is,  given  some  starting  state  s.  the  result 
of  applying  an  action  A  may  be  any  one  of  a  possible  set  of  states  Fa(s)  = 
{s,, ,  s,,,  •  •  •  ,  S{k }  C  S.  This  set  is  called  the  forward  projection  of  the  state  s 
under  action  A.  Figure  3.1  shows  how  we  will  represent  non-deterministic  actions 
graphically.  In  the  figure,  action  Ai  may  have  one  of  three  results  when  applied  to 
state  s0,  but  has  precisely  one  result  when  applied  to  states  ,  s2,  or  s3.  Symbolically, 
we  would  write  this  as: 


So 

•Si,  SiAz 

S 1  F— f 

S\ 

02  1 — * 

s2 

53 

•S3- 

Section  2.2.2  contained  some  examples  of  non-deterministic  actions. 

As  another  example  of  a  non-deterministic  action,  consider  an  Allen  wrench  in 
contact  with  a  tabletop,  as  shown  in  the  top  portion  of  figure  3.2.  Suppose  a  force 
is  applied  through  the  center  of  mass  as  shown.  Depending  upon  the  coefficient  of 
friction,  the  accuracy  of  the  applied  force,  the  position  of  the  center  of  mass,  and  so 
forth,  there  are  two  possible  final  stable  states  of  the  Allen  wrench.  These  are  shown 
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Figure  3.1:  Graphical  representation  of  a  non-deterministic  action  A\. 


in  .ower  portion  of  figure  3.2.  If  the  parameters  determining  the  motion  of  the  wrench 
cannot  be  modelled  accurately,  for  instance  if  the  coefficient  of  friction  is  unknown, 
then  the  action  should  be  modelled  non-deterministically. 

Tasks 

We  will  assume  that  tasks  are  specified  as  goal  states  that  should  be  achieved.  That 
is.  there  is  some  set  Q  C  >5  of  states,  whose  attainment  constitutes  completion  of  the 
task.  By  attainment,  we  will  mean  recognizable  attainment ,  that  is,  the  system  is  in 
a  goal  state  and  knows  that  it  is  in  a  goal  state. 

Similarly,  the  system  is  assumed  to  initially  start  in  some  subset  I  C  5  of  states. 

Sensors 

Finally,  we  should  comment  on  sensors.  Sensors  may  or  may  not  be  available.  We  shall 
model  a  sensor  as  a  relation  between  states  and  subsets  of  states.  In  other  words, 
given  that  the  system  is  in  some  state,  the  sensor  returns  some  subset  of  possible 
interpretations.  See  section  2.2.3  for  a  description  of  possible  types  of  sensors  and 
sensory  interpretations.  In  general,  the  sensor  need  not  be  deterministic,  that  is,  for 
a  given  state,  the  sensor  may  return  one  of  several  possible  sets  of  interpretations. 
However,  we  will  assume  that  there  exists  at  least  one  possible  interpretation  set  for 
any  given  state.  This  assumption  is  always  easily  satisfied,  since  one  can  if  necessary 
take  this  interpretation  set  to  be  the  entire  state  space.  See  also  section  3.2.5  below. 
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Figure  3.2:  The  force  applied  to  the  Allen  wrench  in  the  top  of  the  figure  will  cause 
the  wrench  either  to  slide  without  rotation  or  to  rotate  and  possibly  slide.  The  actual 
motion  depends  on  the  coefficient  of  friction.  If  the  coefficient  of  friction  is  not  known 
it  is  useful  to  model  the  force  as  a  non-deterministic  action. 
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Functional  Representation 

If  we  wanted  to  express  actions  and  sensors  as  functions,  then  the  proper  encoding 
would  be: 


If  A  e  A,  then  A  :  S  —>  2s , 

where  S  is  the  set  of  states  and  2^  is  the  set  of  all  subsets  of  S.  Similarly,  we  can 
model  the  sensor  function  as  a  mapping  E  from  states  to  all  sets  of  subsets  of  states: 

(3.1)  E:S— ►  22*. 

In  other  words,  for  any  state  s,  E(s)  is  a  collection  of  sets,  say  E(s)  =  { ,  ■  •  • ,  1(). 
We  will  refer  and  have  been  referring  to  each  T  as  a  sensory  interpretation  set.  E(s) 
describes  all  possible  sensory  interpretation  sets  that  might  arise  at  run-time  whenever 
the  system  is  in  state  s.  This  means  that  at  run-time  the  physical  sensor  can  return 
some  value  whose  interpretation  is  one  of  the  subsets  I,  of  the  state  space. 

For  a  perfect  sensor,  the  sensing  function  becomes  E(s)  =  {{s}},  for  every  s  £  S. 
Abusing  notation  we  will  sometimes  write  this  as  E(s)  =  s.  On  the  other  extreme, 
if  no  sensor  is  available,  then  E  reduces  to  E(s)  =  {5}  for  every  s  £  S.  Again, 
abusing  notation  we  will  sometimes  write  this  as  E(s)  =  5.  See  section  2.2.3  for  some 
examples  of  sensing  uncertainty. 

These  representations  are  intended  only  to  describe  the  character  of  the  range  of 
actions  and  sensors.  In  other  words,  actions  map  to  sets  of  states,  whiie  sensors  can 
return  any  one  of  a  collection  of  sets  of  states.  Particularly  in  the  case  of  sensors  the 
representation  (3.1)  is  much  too  general.  We  will  impose  additional  constraints  on 
this  representation  in  order  to  derive  a  physically  reasonable  model  in  our  discussion 
on  knowledge  states  below  (see  section  3.2.5). 

Two  comments  should  be  made.  First,  sometimes  it  is  useful  to  break  the  sensor 
function  into  two  parts.  The  first  part  models  the  sensor  values  that  may  result  upon 
examination  of  the  sensor  when  the  system  is  in  a  particular  state.  We  will  denote 
a  possible  such  sensor  value  by  s'.  The  second  part  models  the  interpretations  of 
these  sensor  values  as  sets  of  states.  In  particular,  if  s'  is  an  observed  sensor  value, 
then  I  {s')  will  denote  its  set  of  interpretations.  See,  for  instance,  [TMG].  Often  the 
second  of  these  functions  follows  from  the  first,  so  we  have  decided  to  collapse  the 
representation.  However,  for  some  of  the  examples  in  the  thesis,  when  we  derive  the 
sensory  interpretation  sets  possible  at  a  given  state,  we  may  first  determine  the  actual 
values  {s'}  returned  by  a  physical  sensor,  then  map  these  to  their  interpretation  sets 
{I (s')}.  In  any  event,  no  serious  information  is  lost  by  mapping  directly  from  states 
to  possible  interpretation  sets. 

The  second  comment  concerns  the  domain  of  the  sensor  function,  which  was  taken 
to  be  the  state  space.  Sometimes  a  sensor’s  value  may  depend  on  a  sequence  of  states, 
or  on  some  other  parameter,  rather  than  on  just  the  current  state.  This  is  particularly 
true  in  t  ac  Ct/ uwuiuOu  t;rn°  ivhcr  c  a  physical  sensor  may  be  averaging  noisy 

measurements  over  time  before  reporting  *hese  to  the  control  system.  In  the  discrete 
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case  this  seems  less  likely,  and  so  is  not  modelled  here.  Further,  any  dependence  on 
an  unmodelled  parameter  can  always  be  collapsed  into  further  non-determinism  in 
the  function  E.  This  conservatively  preserves  the  sensor’s  response,  although  it  may 
weaken  the  power  of  the  executive  system  in  making  decisions.  Another  approach 
is  to  augment  the  definition  of  the  system’s  state  in  order  to  incorporate  the  sensor 
state. 

Notice  that  further  variations  on  this  model  are  possible.  For  instance,  the  effect 
of  actions  could  be  made  time-dependent,  as  could  the  results  returned  by  the  sensors. 
We  will  not  consider  such  variations. 


3.2.3  Markov  Decision  Processes 

Non-Determinism  and  Knowledge  Paucity 

We  have  thus  far  chosen  to  represent  transitions  as  non-deterministic  transitions. 
This  reflects  the  presence  of  uncertainty  in  the  actions  we  are  modelling.  This  me  del 
does  not  incorporate  any  further  knowledge  about  the  nature  of  the  uncertainty  in 
the  actions. 

In  some  cases  the  uncertainty  may  be  due  to  a  paucity  of  knowledge  in  modelling 
the  actions  on  the  state  space,  rather  than  an  inherent  non-determinism  in  the  actions 
themselves.  For  instance,  it  may  turn  out  in  figure  3.1  that  action  .4!  actually  always 
moves  from  state  so  to  state  S),  but  this  is  simply  not  known  to  the  task-system. 


Probabilistic  Actions  and  Optimality 

In  other  situations  one  may  have  enough  information  to  think  of  the  transitions 
between  states  as  being  probabilistic.  In  other  words,  associated  with  each  action  and 
each  start  state  is  a  distribution  function,  describing  the  probabilities  of  attaining  the 
states  in  the  forward  projection  Fa  of  the  start  state.  If  actions  may  be  described 
using  probabilistic  transitions,  then  it  is  natural  to  formulate  optimality  problems  in 
terms  of  expected  cost  for  some  cost  function  defined  on  the  states  and  the  transitions 
between  them.  A  typical  problem  is  to  find  the  sequence  of  actions  that  attains  a 
goal  state  in  minimum  expected  time.  Such  problems  are  known  as  Markov  Decision 
Processes ,  and  have  been  studied  for  several  decades.  R.ecent  results  by  [Pap]  and 
[PT]  have  characterized  these  problem  in  terms  of  PSPACE.  In  particular,  the  general 
problem  of  finding  the  minimum  expected  cost  sequence  of  actions  of  a  given  length 
is  shown  to  be  PSPACE-hard.  Various  specializations  of  the  problem  are  actually 
in  PSTACE.  Of  particular  interest  are  the  perfect-sensing  and  no-sensing  cases.  The 
latter  problem  is  shown  to  bo  NP  complete,  whi'»  th**  former  is  sK'wn  to  be  T- 
compiete.  A  standard  approach  for  computing  optimal  decisions  in  the  perfect-sensing 
case  is  to  use  dynamic  programming  (see,  for  example,  [Bert]). 
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Probabilistic  Sensors 

Note  that  sensing  may  also  be  formulated  probabilistically.  There  are  at  least  two 
natural  ways  of  doing  this.  In  our  current  representation,  if  the  system  is  in  state  s, 
then  E(s)  is  a  collection  of  sets.  This  means  that  at  execution  time  the  sensor  can 
return  any  one  of  the  sets  in  E(s).  Each  set  is  a  sensor  interpretation  describing  the 
possible  states  of  the  system.  One  possibility  for  a  probabilistic  sensor  would  be  to 
define  a  probability  distribution  over  this  collection  of  sets.  In  other  words,  for  each 
state  of  the  system,  the  sensor  has  a  certain  probability  of  returning  any  given  set  of 
interpretations.  Another  possibility  is  to  not  model  E  as  returning  different  possible 
sets  of  states,  but  to  instead  model  E  as  returning  different  possible  probability 
distributions.  In  other  words,  for  each  state  of  the  system  s,  E(s)  is  a  collection 
of  probability  distributions  over  the  state  space.  One  can  merge  these  two  variations 
by  assigning  probabilities  to  each  of  the  probability  distributions  in  the  collection 
E(s).  Indeed,  this  is  often  the  approach  taken.  For  instance,  if  we  have  a  Gaussian 
sensor,  then,  for  each  state  of  the  system,  we  can  associate  a  probability  density  to 
the  possible  sensor  values.  And  bv  inverting  these  distributions  using  a  Bayesian 
approach,  we  can  think  of  each  sensor  value  as  defining  a  probability  distribution  on 
the  state  space.  See  also  section  3.2.6. 

3.2,4  Dynamic  Programming  Example 

This  section  reviews  and  demonstrates  the  use  ot  dynamic  programming  by  a 
simple  example.  The  main  reason  for  reviewing  dynamic  programming  is  its  use  of 
backchaining,  a  method  that  is  useful  for  computing  guaranteed  plans  in  the  presence 
of  uncertainty.  We  will  state  the  example  in  a  probabilistic  setting.  However,  it  should 
be  understood  that  the  same  approach  applies  to  planning  guaranteed  strategies  in  the 
presence  of  non-deterministic  uncertainty.  We  briefly  indicate  the  planning  process 
in  the  non-deterministic  case. 

A  Probabilistic  Example 

The  example  consists  of  a  series  of  states  connected  by  actions  that  have  probabilistic 
transitions  (see  figure  3.3).  After  any  transition,  sensors  report  the  resulting  state 
with  complete  accuracy.  The  starting  state  can  also  be  sensed  with  perfect  accuracy. 
The  task  is  to  determine  a  mapping  from  knowledge  states  to  actions  that  maximizes 
the  probability  of  attaining  the  goal  in  a  specific  number  of  steps.  This  mapping 
constitutes  a  plan  or  a  strategy  for  attaining  the  goal.  Knowledge  states  are  discussed 
further  in  sections  2.3.3  and  3.2.5.  Intuitively,  a  knowledge  state  describes  the 
system’s  current  best  estimate  of  a  region  in  which  it  is  located.  A  knowledge  state  is 
determined  by  current  and  past  sensory  information,  as  well  as  by  predictions  based 
on  executed  actions.  With  perfect  sensing,  the  relevant  knowledge  states  are  simply 
the  actual  states  of  the  system. 

The  basic  idea  of  dynamic  programming  is  to  maximize  (or  minimize)  some  value 
function  in  terns  of  the  actions  available  and  the  number  of  steps  remaining  to  be 
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Figure  3.3:  A  state  graph  with  probabilistic  transitions.  There  are  four  states  and 
three  actions.  The  label  on  an  arc  indicates  the  probability  that  the  transition  will 
be  taken  when  the  specified  action  is  executed.  All  transitions  not  indicated  are 
seif-transitions. 


executed.  At  each  stage,  an  action  is  selected  for  each  state  that  would  maximize 
the  value  function  given  that  there  remain  a  certain  number  of  steps  to  be  executed. 
This  maximization  is  performed  by  first  recursively  determining  the  maximum  values 
obtainable  for  each  state  given  one  fewer  step,  then  selecting  an  action  for  the  current 
state  that  maximizes  the  expected  value  of  moving  to  another  state.  One  starts  the 
whole  process  off  by  assigning  values  to  each  state  that  reflect  the  value  of  the  value 
function  if  no  actions  whatsoever  remain  to  be  executed.  This  is  exactly  what  it  means 
to  backchain  from  a  goal.  For  the  situation  in  which  one  is  looking  for  strategies 
with  maximal  probability  of  success,  the  value  function  represents  the  probability  of 
achieving  the  goal  in  the  remaining  steps.  Goal  states  are  initially  assigned  a  value  of 
1:  non-goal  states  a  value  of  0.  Further,  the  value  in  the  kth  stage  of  the  computation 
for  a  particular  state  is  the  probability  of  attaining  the  goal  from  that  state  in  at  most 
k  steps,  assuming  that  the  system  can  sense  perfectly  and  that  it  always  executes  the 
maximizing  action  at  each  state. 

The  backchaining  maximization  of  dynamic  programming  may  be  depicted  by 
a  table  (see  below).  The  columns  of  the  table  correspond  to  the  stages  in  the 
backchaining  process;  the  rows  correspond  to  the  knowledge  states  of  the  execution 
system.  Counting  from  right  to  left,  an  entry  in  the  kth  column  of  the  table  for 
knowledge  state  A',  specifies  the  action  to  be  taken  at  run  time  if  there  remain  k 
time-steps  in  which  to  execute  actions  and  if  the  system’s  current  knowledge  state  is 
A',.  The  entry  in  the  table  might  also  specify  the  value  of  the  value  function  computed 
at  that  point  in  the  backchaining  process.  For  instance,  the  entry  might  specify  the 
maximal  probability  of  success  given  that  there  remain  k  actions  to  execute  and  given 
that  the  system  is  in  some  state  st. 
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Consider  now  figure  3.3,  which  depicts  four  states  and  three  actions.  The 
transitions  resulting  from  the  execution  of  actions  are  labelled  with  probabilities. 
Ail  actions  are  applicable  in  all  states.  However,  for  simplicity,  we  have  not  drawm 
transitions  that  leave  a  state  unchanged.  For  instance,  if  the  current  state  is  state  s2, 
then  action  A2  moves  to  state  s4  with  probability  1/10,  while  action  Ax  remains  in 
state  s2  with  probability  1. 

Suppose  that  state  s4  is  the  goal  state.  Then  the  value  assigned  to  the  four  states 
at  the  zeroth  stage  of  backchaining  is  0  for  states  s2,  and  S3,  and  1  for  state  s4. 
At  the  first  stage,  the  values  assigned  are  1/4  for  state  sl5  1/10  for  state  s2,  and  1 
for  states  s 3  and  s4.  These  values  reflect  the  maximum  probabilities  of  attaining  the 
goal  in  one  or  zero  steps. 

The  following  table  reflects  the  computations  for  four  stages.  The  entries  in  the 
table  are  the  computed  maximum  probabilities,  along  with  the  correct  action  to  take 
in  that  state,  given  the  number  of  steps  remaining.  In  this  example,  the  optimal 
actions  for  each  of  the  states  happen  to  be  the  same  across  stages,  but  that  need  not 
be  the  case  in  general. 


Steps  Remaining 

3  2  10 

O 

0 

1;  .4 2  1;  .4 2  1/10;  A2  0 

s2 

States 

1:  A3  T,  A3  1;  A3  0 

^3 

- 

1;  stop  1;  stop  1;  stop  1;  stop 

S4 

Probabilities  of  success:  Optimal  actions. 

The  table  shows  that  the  goal  can  be  achieved  with  certainty  from  any  state  using 
no  more  than  three  steps,  as  one  would  expect. 

Complexity 

Computing  such  a  table  out  to  k  stages  for  a  state  space  with  n  states  and  O(m) 
actions  can  be  done  straightforwardly  in  time  0(km  n2).  In  particular,  the  solution  is 
in  P  (polynomial  time).  [In  this  complexity  estimate  we  are  ignoring  the  precision  of 
the  transition  probabilities,  that  is,  we  are  assuming  that  addition  and  multiplication 
can  be  done  in  constant  time.) 

A  Non-Deterministic  Example 

For  completeness  of  exposition  suppose  that  the  transition  graph  of  figure  3.3  is 
non-deterministic  rather  than  probabilistic.  In  this  case  the  value  function  to  be 
maximized  by  the  dynamic  programming  approach  is  a  boolean  function.  A  ul”  of 
this  function  corresponds  to  guaranteed  success,  while  a  u0”  corresponds  to  possible 
failure.  The  dynamic  programming  table  for  the  non-deterministic  case  is  almost 
identical  in  appearance  to  the  table  for  the  probabilistic  case.  A  blank  entry  in  the 
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table  indicates  that  success  cannot  be  guaranteed  in  the  number  of  steps  remaining 
from  that  state.  In  other  words,  the  boolean  value  function  has  value  “0”.  Conversely, 
an  entry  with  an  action  Ai  indicates  that  the  boolean  value  function  has  value  ul”, 
that  is,  eventual  goal  attainment  is  guaranteed  if  the  system  executes  action  A,. 

Again,  the  table  shows  that  the  goal  can  be  achieved  with  certainty  in  at  most 
three  steps. 


Steps  Remaining 

3  2  10 

Al 

S\ 

A-2  A  2 

*2 

States 

A3  A3  A3 

s3 

stop  stop  stop  stop 

S4 

Actions  that  guarantee  goal  attainment. 

3.2.5  Knowledge  States  in  the  Non-Deterministic  Setting 

This  and  the  next  section  explain  how  to  represent  the  possible  states  of  a  system  at 
execution  time,  that  is,  what  the  executive’s  knowledge  states  are.  A  planner  must 
of  course  reason  about  more  knowledge  states  than  actually  occur  during  execution, 
since  in  gcnc-al  at  planning  time  the  outcome  of  a  sensing  operation  will  not  be  known 
precisely. 

Forward  Projection 

First,  let  us  look  at  the  case  in  which  actions  are  non-determiuistic  and  sensors  return 
possible  sets  of  interpretations.  In  this  case,  at  any  given  time  during  execution  the 
actual  state  of  the  system  is  known  only  to  be  one  of  possibly  many.  Thus  the  space 
of  knowledge  states  is  simply  the  set  of  all  subsets  of  the  state  space,  namely  25. 
Given  a  set  h\  of  possible  states  that  the  system  could  be  in,  and  an  action  A,  the 
result  of  executing  action  A  is  a  new  knowledge  state  A2,  given  by: 

A*  =  U  FaU). 

»eK  i 

In  other  words,  A  2  is  the  union  of  all  the  possible  non-deterministic  transitions 
resulting  from  possible  states  in  A',.  Notice  that  this  knowledge  is  equivalent  both 
at  execution  time  and  at  planning  time.  The  process  of  forming  A'2  is  called  forward 
projecting  set  K\  under  action  A,  and  is  written  Kj  =  Fa(K\). 

Forward  projections  possess  a  nice  property.  The  forward  projection  of  a  collection 
of  sets  is  just  the  union  of  the  forward  projections  of  the  individual  sets.  This  is 
summarized  in  the  following  lemma. 

Lemma  3.1  Let  {A',}  be  a  collection  of  knowledge  states,  and  let  A  be  a  non- 
deterministic  action.  Then 
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Fa  (U  A'.)  ={JFa(K\). 

Proof.  Clear  from  the  definition.  | 

Sensing 

Let  us  now  turn  to  the  procedure  by  which  a  run-time  executive  might  update 
its  knowledge  state  using  sensing.  Given  a  knowledge  state  Kt  and  a  sensory 
interpretation  set  I .  the  resulting  knowledge  state  is  A’2  =  K]f]I.  For  sensing, 
however,  knowledge  at  execution  time  can  be  considerably  different  than  at  planning 
time:  at  execution  time  the  set  I  is  known,  whereas  at  planning  time  the  system  only 
knows  that  I  will  come  from  one  of  several  possible  sets  of  interpretations. 

See  again  figure  2.6  on  page  71.  which  shows  the  process  of  forward  projecting 
a  knowledge  state  and  intersecting  the  forward  projection  with  the  current  sensory 
interpretation  set. 

The  analogue  to  the  distributive  property  of  forward  projections  is  given  for 
sensory  interpretation  sets  by  the  distributive  property  of  set  intersections. 

Lemma  3.2  Let  {A',}  be  a  collection  of  knowledge  states,  and  let  l  be  some  sensory 
interpretation  set.  Then 

(uK.)rv=u<*rv>- 

Proof.  Clear.  | 

In  the  next  few  paragraphs  we  will  augment  the  process  by  which  a  system  updates 
its  knowledge  state  using  sensory  information.  Indeed  it  is  sometimes  useful  to  make 
use  of  more  structure  than  that  provided  simply  by  intersecting  the  current  knowledge 
state  with  the  current  sensory  inte  pretation  set. 

Constraints  on  Sensors 

We  will  make  one  further  set  of  assumptions  concerning  the  possible  sensory 
interpretations.  The  purpose  of  these  assumptions  is  to  rule  out  inconsistencies 
that  would  be  possible  given  the  unrestrictive  definition  of  the  sensing  function  E 
in  equation  (3.1 ). 

Consider  figure  3.4.  The  figure  shows  the  system’s  current  knowledge  state  A’ 
which  includes  the  actual  state  of  the  system  x.  The  sensed  value  is  x*,  and  the 
sensory  interpretation  set  7(x’)  is  given  by  a  disk  centered  at  x*.  Unfortunately  this 
disk  does  not  overlap  the  knowledge  state.  Thus  if  the  system  updates  its  knowledge 
state  by  computing  K  f)7(x*),  the  resuit  will  be  the  empty  set.  The  problem  here  is 
that  the  sensory  interpretation  set  does  not  include  the  actual  state  of  the  system. 
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Figure  3.4:  The  sensory  interoretation  set  in  this  figure  does  not  overlap  the  system's 
previous  knowledge  state.  This  implies  an  inconsistency.  The  actual  state  of  the 


This  leads  to  our  first  restriction  on  the  definition  of  the  sensing  function  r..  We 
require  that  a  sensory  interpretation  set  always  include  the  actual  state  of  the  system.1 

Partial  Sensing  Consistency  Requirement.  Let  s  be  a  system  state,  and  lc 
/  t  E(s)  be  a  possible  sensory  interpretation  set  retuined  by  the  sensor  when  the 
system  is  in  state  s.  We  require  that  s  G  /.  This  means  simply  that  a  state  is  always 
an  interpretation  of  any  sensor  value  to  which  it  can  give  rise. 

Inconsistent  Knowledge  States 

The  example  of  figure  3.4  introduced  the  notion  of  a  sensory  interpretation  set  that 
is  inconsistent  wi*h  the  current  knowledge  state.  For  the  example  of  the  figure,  the 
inconsistency  is  removed  by  the  partial  sensing  consistency  requirement.  This  is 
because  the  knowledge  state  K  contains  the  actual  state  of  the  system.  However,  if 
the  run-time  executive's  knowledge  state  does  not  contain  the  system’s  actual  state, 
then  it  is  still  possible  to  obtain  a  sensory  interpretation  set  that  does  not  overlap 

'In  the  probabilistic  case,  it  is  sometimes  useful  to  relax  this  requirement.  In  particular,  when 
sensory  interpretations  are  density  functions  with  infinite  tails  it  is  useful  to  insist  merely  that 
the  sensory  interpretation  set  cover  the  actual  of  the  the  system  with  some  sufficiently  high 

probability.  We  will  make  use  tacitly  of  this  version  of  the  partial  sensing  consistency  requirement 
in  chapter  5.  See  >n  particular  section  5  2. 
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the  executive’s  run-time  knowledge  state. 

Tl,ere  is  a  subtle  issue  here  that  requires  iurther  explanation.  In  particular,  why 
should  a  system’s  knowledge  state  not  contain  the  actual  state  of  the  system?  This 
may  seem  peculiar,  since  the  knowledge  state  is  intended  to  reflect  the  certainty  with 
which  the  system  knows  its  actual  state.  If  the  knowledge  state  does  not  contain 
the  system’s  actual  state  then  something  must  be  wrong  in  the  modelling  of  the 
information  available  to  the  system,  either  in  the  modelling  of  the  actions  or  in  the 
modelling  of  the  sensors.  This  means  that  if  ever  the  system  encounters  the  empty 
set  upon  having  updated  its  knowledge  state,  then  the  system  knows  immediately 
that  something  is  wrong  in  the  modelling  of  the  task.  In  turn,  this  suggests  that  we 
need  not  worry  about  inconsistent  interpretations,  since  if  an  inconsistency  ever  does 
occur,  it  must  due  to  an  unmodelled  parameter,  that  is,  an  event  beyond  the  scope 
of  the  task  description.  The  smart  thing  to  do  is  to  step  the  task  execution  and  to 
try  to  model  the  unknown  parameter. 

This  explanation  is  correct,  but  it  ignores  part  of  the  motivation  for  the  thesis. 
In  particular,  we  would  like  to  develop  methods  for  solving  tasks  without  having  full 
knowledge  of  all  the  parameters  in  the  system.  The  particular  approach  taken  in 
this  thesis  is  to  actively  randomize,  either  by  guessing  sensor  vfdues  or  by  executing 
random  actions.  The  randomization  is  intended  to  blur  the  significance  of  these 
unmodelled  parameters.  Formally,  as  we  indicaied  in  section  2.3.4  on  page  73.  one 
view  of  randomization  is  as  the  random  guessing  of  possible  knowledge  states.  In 
other  words,  the  actual  knowledge  state  of  the  system  is  too  large  for  it  to  execute 
a  useful  strategy,  so  the  system  simply  guesses  that  its  actual  state  lies  in  a  smaller 
set.  The  smaller  set  is  then  assumed  to  be  the  knowledge  state.  Actions  and  sensing 
update  this  smaller  knowledge  set.  rather  than  some  larger  knowledge  state,  as  if  it 
were  the  correct  description  of  the  run-time  executive's  certainty.  This  approach  will 
be  further  explored  starting  in  section  3.9. 

We  see  then  that  the  set  I\f]I(x')  readily  can  be  empty,  where  7(i’)  is  some 
sensory  interpretation  set  and  K  is  a  knowledge  state.  This  is  because  the  knowledge 
state  K  may  have  been  randomly  selected  during  a  guessing  step  of  a  randomized 
strategy.  This  is  actually  very  useful.  For,  if  the  run-time  executi.e  ever  observes 
that  K  f]I(i’)  =  0.  then  it  knows  that  the  actual  state  of  the  system  cannot  be  in 
h  .  This  implies  that  the  original  guess  of  A"  as  an  appropriate  knowledge  state  must 
have  been  wrong.  Having  determined  that  the  guess  was  wrong,  the  system  can  then 
guess  again,  or  try  some  other  strategy. 

interpreting  Sensors  More  Carefully 

This  section  n  ay  be  skipped  on  a  first  reading.  It  deals  with  a  technical  point 
regarding  the  consistency  of  the  sensing  function  E. 

Thus  far  we  have  only  imposed  one  restriction  on  the  character  of  sensory 
interpretations,  namely  the  partial  sensing  consistency  requirement.  This  restriction 
merely  insured  consistency  between  the  actual  state  of  the  system  and  observed 
sensory  interpretation.,.  The  requirement  may  be  interpreted  as  ensuring  that  sensory 
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interpretation  sets  are  not  too  small.  However,  thus  far  we  have  not  imposed  a 
constraint  in  the  other  direction,  to  ensure  that  sensory  interpretation  sets  are  not 
too  large. 

If  sensory  interpretation  sets  are  larger  than  necessary,  then  it  may  be  to 
a  strategy’s  advantage  to  perform  a  more  complicated  operation  than  merely 
intersecting  the  sensory  interpretation  set  with  the  current  knowledge  state.  The 
next  few  paragraphs  indicate  what  is  meant  by  a  sensory  interpretation  set  that  is 
too  large  and  how  a  system  can  better  update  its  knowledge  state.  Fortunately,  it  is 
possible  simply  to  modify  the  sensory  interpretation  sets  prior  to  execution  time  so 
that  they  are  not  too  large.  This  modification  will  be  formulated  in  terms  of  a  second 
consistency  requirement. 

A  fairly  natural  way  in  which  sensory  interpretation  sets  may  be  too  large  is  if 
they  are  chosen  conservatively  to  bound  the  actual  state  of  the  system.  For  instance, 
consider  figure  3.5. 

This  is  an  example  on  a  continuous  space,  but  the  moral  of  the  example  applies 
equally  well  of  course  to  discrete  spaces.  In  this  two-dimensional  example  the 
sensing  error  ball  has  a  radius  varying  as  a  function  of  the  system's  x-coordinate. 
In  particular,  if  the  actual  state  of  the  system  is  (x.y).  then  the  range  of  possible 
sensor  values  is  given  by  a  circle  centered  at  (x,y)  with  radius  x/4.  This  example 
is  supposed  to  abstract  the  notion  of  a  position-dependent  error  function.  Suppose 
that  the  work  space  is  given  by  the  square  [0. 1]  x  [0,1].  If  (x",y*)  is  an  observed 
sensor  value,  then  one  may  take  as  the  sensory  interpretation  set  7(x*,y*)  the  circle 
of  radius  1/4  centered  at  (x*,y’).  Clearly  this  interpretation  set  is  too  large  for  small 
values  of  x.  but  it  is  definitely  a  conservative  approximation,  and  satisfies  the  partial 
sensing  consistency  requirement. 

Now  consider  the  example  of  figure  3.6.  There  are  two  knowledge  states,  given  by 
the  two  vertical  strips  —  {(x.y)  |  0  <  x  <  0.4  }  and  A'2  =  {(x.y)  |  0.7  <  x  <  1 }. 
Let  the  observed  sensor  value  be  (x*,y’)  =  (0.6, 0.5),  with  corresponding  sensory 
interpretation  set  /(x',y")  =  Bj/4(0.6, 0.5).  Clearly  this  sensory  interpretation  set 
overlaps  each  of  the  knowledge  states  h\  and  K2-  If  a  system  simply  intersects 
sensory  interpretation  sets  with  knowledge  states,  then  the  system  would  conclude 
that  its  location  could  be  either  in  the  set  K\  or  the  set  K2.  On  the  other  hand  it 
is  clear  to  us  as  outside  observers  that  no  point  in  K\  could  have  given  rise  to  the 
sensor  value  (0.6.  0.5).  This  is  because  the  maximum  range  of  possible  sensor  values 
for  a  point  (x,y)  G  h\  is  a  disk  of  radius  0.1.  This  means  that  the  maximum  possible 
x’- value  observable  if  the  system  is  in  K\  is  0.5.  Only  system  states  in  the  set  K2 
could  give  rise  to  the  observed  sensor  value  (0.6, 0.5).  However,  again,  not  all  of  the 
system  states  in  the  intersection  h'2f]I(xm,ym)  could  give  rise  to  the  observed  sensor 
value  (x’,y*).  In  short,  even  the  intersection  of  the  sensory  interpretation  set  with 
K2  is  an  overestimate. 

The  previous  example  is  not  surprising.  After  all,  having  conservatively  bounded 
the  actual  sensory  interpretation  sets,  one  would  expect  that  the  run-time  knowledge 
states  computed  bv  the  system  might  overestimate  uncertainty.  The  question  is 
whether  the  structure  of  the  function  E  is  internally  consistent  (see  definition  below). 
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Figure  3.5:  The  sensing  error  ball  in  this  example  is  position  dependent.  If  the  actual 
state  of  the  system  is  (x,  y),  the  possible  sensor  values  are  given  by  a  ball  of  radius 
x/4  centered  at  ( x.y ).  Over  the  indicated  workspace  a  conservative  approximation 
to  the  sensory  interpretation  set  for  an  observed  sensory  value  (x’,y‘)  is  given  by  a 
ball  of  radius  1  /4. 
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Figure  3.6:  The  sensing  error  ball  about  the  sensed  value  (x*,y*)  =  (0.6, 0.5)  overlaps 
both  knowledge  sets  K i  and  AV  However,  the  observed  sensor  value  can  only 
correspond  to  an  actual  system  state  in  the  set  A'2-  This  is  because  the  range  of 
sensor  values  for  points  in  h\  has  a  maximum  radius  of  0.1.  The  position-dependent 
possible  sensor  values  are  described  in  figure  3.5. 
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In  particular,  let  us  consider  the  collection  of  interpretation  sets  E(s)  for  some  state 
s  =  ( x,y ).  This  collection  might  be  of  the  form 

(3-2)  H(x,y)  =  (J  {fl1/4(**,y*)}, 

or  it  might  be  of  the  form 

(3-3)  E(x,y)=  y  {51/4(x*,y*)}, 

to  name  two  extremes.  The  first  collection  (3.2)  consists  of  all  balls  of  radius  1/4 
whose  centers  (x*,y“)  lie  within  distance  x/4  of  the  actual  state  of  the  system.  In 
other  words,  this  collection  correctly  models  the  actual  sensor  values  that  the  system 
might  observe,  but  then  conservatively  bounds  the  interpretations  of  these  sensor 
values.  The  second  collection  (3.3)  consists  of  all  balls  of  radius  1/4  whose  centers 
(x*,  y‘)  lie  within  distance  1/4  of  the  actual  state  of  the  system.  In  other  words,  this 
collection  not  only  conservatively  bounds  the  interpretations,  but  also  conservatively 
assumes  that  a  greater  range  of  sensor  values  is  possible  than  the  system  will  actually 
observe. 

If  the  sensor  function  E  is  of  the  form  given  by  (3.2),  then  the  system  can  obtain 
additional  information  by  investigating  the  function  E  that  it  cannot  obtain  simply 
by  intersecting  sensory  interpretation  sets  with  knowledge  states.  In  particular,  if 
the  sensing  function  is  of  the  form  (3.2),  then  the  system  can  rule  out  interpretations 
in  the  set  K\  of  figure  3.6,  while  retaining  some  or  all  of  the  interpretations  in  the 
set  K2-  On  the  other  hand,  if  the  sensing  function  E  is  of  the  form  given  by  (3.3), 
then  the  system  can  do  no  better  than  to  intersect  sensory  interpretation  sets  with 
knowledge  states. 

Definition.  In  some  sense  the  sensing  function  given  by  (3.3)  is  internally 
consistent.  By  this  we  mean  that  the  system  cannot  gain  any  extra  information 
by  explicitly  examining  the  structure  of  the  sensing  function,  as  by  examining  the 
collections  E(s)  for  all  states  s.  Instead,  all  the  information  upon  observing  a  given 
sensor  value  sm  is  available  in  the  interpretation  set  I{s‘). 

In  contrast,  the  sensing  function  given  by  (3.2)  is  not  internally  consistent.  There 
are  two  basic  ways  to  make  this  function  internally  consistent.  One  is  to  modify  the 
collections  E(s)  so  that  they  conservatively  bound  the  range  of  possible  sensor  values 
as  in  (3.3).  The  other  is  to  modify  the  actual  interpretation  sets  so  that  they  are 
exact  rather  than  conservative  bounds. 

One  question  of  interest  is  how  a  system  should  update  its  knowledge  state  if  the 
sensing  function  E  is  not  necessarily  internally  consistent.  Suppose,  in  particular,  that 
the  system’s  current  knowledge  state  is  K\  and  that  it  has  observed  a  sensor  value  with 
interpretation  set  /.  Let  us  define  an  operation  fY  that  updates  the  knowledge  state 
h\  using  both  the  sensory  interpretation  set  I  and  information  about  the  structure 
of  the  function  E.  We  want  the  updated  knowledge  state  K 2  to  consist  of  till  states  in 
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both  Ki  and  7  that  could  have  given  rise  to  the  sensory  interpretation  set  I.  Formally, 
K2  =  Kf\'  I,  where 

(3.4)  tfn'f  =  {«e*fVU6  2(*)}- 

This  expression  provides  a  formula  for  ensuring  that  the  sensing  function  E  is 
internally  consistent.  Expression  (3.4)  says  that  one  should  delete  from  a  sensory 
interpretation  set  7  any  states  that  could  not  possibly  give  rise  to  I. 

We  can  summarize  the  condition  that  a  sensing  function  be  internally  consistent 
by  imposing  an  additional  consistency  requirement.  The  purpose  of  this  requirement 
is  to  capture  the  condition  under  which  the  operator  fY  reduces  to  the  operator  f). 
Combining  this  condition  with  the  partial  sensing  consistency  requirement  yields  the 
following  consistency  requirement. 

Full  Sensing  Consistency  Requirement.  Let  E  be  a  sensing  function  on  a  state 
space  S.  Denote  by  E(5)  the  set  of  all  possible  sensory  interpretation  sets,  that 
is,  E(5)  =  Ujg-s  E(s).  We  say  that  a  sensing  function  satisfies  the  full  sensing 
consistency  requirement  if  the  following  condition  holds  for  all  states  s  £  S: 

7  G  E(s)  if  and  only  if  s  G  /  and  7  G  E(<S). 

In  other  words,  if  a  state  can  give  rise  to  a  sensory  interpretation  set  then  that 
interpretation  set  must  include  the  state  itself,  and  conversely.  It  was  the  converse 
requirement  that  was  missing  in  the  example  of  figure  3.6.  It  makes  a  lot  of  sense  to 
impose  the  partial  sensing  consistency  requirement,  as  sensors  that  do  not  satisfy  it 
do  not  seem  very  useful.  Once  one  has  the  partial  consistency  requirement,  it  is  easy 
enough  to  impose  the  full  consistency  requirement.  After  all,  suppose  that  one  sees 
a  sensory  interpretation  set  7  which  nominally  contains  the  state  s.  If  one  examines 
E(s)  and  discovers  that  I  £  E(s)  then  one  knows  that  s  could  not  possibly  have  given 
rise  to  7.  Thus  one  may  as  well  replace  7  with  7  —  {s}.  This  was  the  gist  of  the 
operation  fY  defined  by  (3.4)  above. 

For  the  sake  of  completeness  we  prove  the  following  lemma,  which  establishes  that 
D  and  fY  really  are  the  same  operator  when  the  full  sensing  consistency  requirement 
holds. 

Lemma  3.3  Suppose  E  is  a  sensing  function  on  a  state  space  S  that  satisfies  the  full 
sensing  consistency  requirement.  Then  fY  =  fY 

Proof.  Let  K  C  5  be  a  knowledge  state,  and  let  7  G  E(5)  be  a  sensory 
interpretation  set.  We  need  to  show  that  Kf\I  =  A' fY 7.  By  the  definition  (3.4), 
we  see  that  K  f]'  I  C  Kf]I.  Thus  we  need  only  to  establish  the  reverse  inclusion. 
Suppose  that  s  G  K  f)  7.  In  particular  s  G  7.  By  the  full  sensing  consistency 
requirement  it  follows  that  7  G  E(s).  The  definition  (3.4)  then  establishes  that 
s  G  K  fY  7,  as  desired.  1 
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To  summarize,  the  full  sensing  consistency  requirement  ensures  that  the  sensory 
interpretation  sets  are  neither  too  small  nor  too  large.  This  means  that  all  the 
information  available  to  the  system  from  the  sensing  function  is  contained  in  the 
individual  sensory  interpretation  sets.  This  is  clearly  a  desirable  property.  In 
particular,  it  permits  the  system  to  update  knowledge  states  with  sensory  information 
using  set  intersections. 

3.2.6  Knowledge  States  in  the  Probabilistic  Setting 

Next,  let  us  consider  the  case  in  which  all  actions  are  probabilistic  and  in  which 
sensory  interpretations  are  also  probabilistic.  Specifically,  let  us  assume  that  for  a 
given  action  A ,  if  the  system  is  in  state  s,  then  it  will  move  to  state  s:  with  probability 
Pi,.  The  matrix  {p,,)  is  known  as  a  probability  transition  matrix.  Similarly,  a  sensory 
interpretation  is  really  a  conditional  distribution  vector  (tj).  This  says  that  if  the 
system  was  thought  to  be  in  state  s}  with  probability  Pj  before  the  sensory  operation, 
then  after  the  sensory  operation  it  is  thought  to  be  in  state  s}  with  probability  p:  i}/l, 
where  i  is  a  normalization  factor  required  to  ensure  that  the  resulting  probabilities 
form  a  true  distribution  (see  below).  [As  this  expression  indicates,  the  numbers  {^} 
are  usually  determined  by  a  Bayesian  analysis  of  how  different  states  can  give  rise  to 
different  sensor  readings.] 

In  the  probabilistic  setting,  the  state  of  the  system  is  known  with  some  probability. 
Thus  the  natural  knowledge  states  are  probability  distributions  over  the  state  space 
S.  In  other  words,  a  knowledge  state  is  a  collection  of  ]5|  non-negative  numbers  that 
add  up  to  one.  If  the  current  knowledge  state  is  Ki  =  {po,  Pi,  •  •  • ,  pn},  and  action  A 
has  probability  transition  matrix  (p,j),  then  the  effect  of  applying  action  A  is  a  new 
knowledge  state  K2  =  {<70,  <?i,  •  •  • ,  <?„},  where 

n 

9>  =  X-  Pj  Pi*' 

1=0 

This  is  just  a  probabilistic  forward  projection.  The  sum  is  similar  to  a  union  operation; 
it  measures  the  probability  of  moving  to  state  s,-  from  each  state  Sj  in  the  system, 
multiplied  by  the  probability  of  having  actually  started  in  that  state. 

As  we  have  already  indicated,  a  sensory  interpretation  /  corresponding  to  some 
observed  sensor  value  s*  is  of  the  form 


I  —  (l 0,  *1,  •  ■  •  ,  tn)- 

Here  is  the  conditional  probability  of  observing  s*  given  that  the  state  of  the 
system  is  sr  The  sensory  interpretation  /  changes  a  knowledge  state  from  A'j  = 
{po,  Pi,  •  •  • ,  Pn}  to  K2  =  {po  to  A,  Pi  ti/t,  •  •  • ,  pn  tn/t}.  This  is  just  the  probabilistic 
equivalent  of  set  intersections  in  the  non-deterministic  case.  Note  that 


112 


CHAPTER  3.  RANDOMIZATION  IN  DISCRETE  SPACES 


3.2.7  Connectivity  Assumption 

We  would  like  to  make  a  connectivity  assumption  that  ensures  that  the  goal  is 
reachable  from  each  possible  state  of  the  system.  In  the  probabilistic  setting  this 
assumption  amounts  to  the  condition  that  for  each  start  state  there  is  a  sequence 
of  arcs  with  non-zero  transition  probabilities  that  attains  the  goal.  In  the  non- 
deterministic  case  the  assumption  amounts  to  the  condition  that  even  in  a  worst-case 
scenario  there  is  always  some  sequence  of  arcs  that  leads  from  each  state  to  the  goal. 

The  purpose  of  this  connectivity  assumption  is  to  rule  out  massive  disasters  from 
which  recovery  is  impossible.  In  other  words,  there  are  no  non-goal  trap  states  or  trap 
subsets.  An  example  of  a  trap  is  a  snap-fit.  Other  examples  include  orienting  parts 
over  a  deep  lake  or  walking  in  a  tiger-filled  jungle.  Generally,  in  the  domain  of  tasks 
involving  the  manipulation  or  assembly  of  rigid  objects,  the  connectivity  assumption 
will  be  met  so  long  as  one  can  apply  arbitrary  forces  and  torques  on  the  objects  being 
manipulated. 

The  reason  for  ruling  out  such  massive  failures  is  to  prevent  randomized  strategies 
irom  failing  irrecoverably.  In  a  more  general  setting  in  which  certain  parts  of  the  state 
space  must  be  avoided  at  all  costs,  one  must  restrict  randomization  to  the  safe  part 
of  the  state  space.  If  this  is  not  possible,  then  randomization  should  not  be  applied. 

Probabilistic  Setting 

In  the  case  that  actions  are  specified  as  probabilistic  transitions,  the  connectivity 
assumption  amounts  to  verifying  that  the  transitive  closure  of  each  state  in  the 
induced  transition  graph  contains  a  goal  state.  The  transitive  closure  of  a  state 
in  a  directed  graph  is  the  set  of  all  states  reachable  from  that  state  by  some  path.  By 
the  induced  transition  graph  we  mean  the  directed  graph  whose  vertex  set  is  the  set 
S  of  underlying  states,  and  whose  directed  arcs  are  given  by  the  set  of  all  transition 
arcs  whose  associated  probabilities  are  non-zero.  This  set  is  computed  by  considering 
the  set  A  of  all  possible  actions. 

Non-Deterministic  Setting 

In  the  situation  that  actions  are  specified  as  non-deterministic  transitions,  we  need 
a  stronger  condition  than  for  the  probabilistic  case.  In  the  probabilistic  case  we 
essentially  verify  the  possibility  of  moving  from  any  state  to  a  goal  by  looking  for 
some  sequence  of  transitions  connecting  the  state  to  tne  goal.  Since  each  arc  has  a 
positive  probability  of  being  executed,  the  sequence  as  a  whole  has  positive  probability 
of  being  executed,  so  it  is  possible  to  reach  the  goal  from  the  given  state.  In  the  non- 
deterministic  case,  such  a  test  is  not  sufficient.  This  is  because  some  arcs  appear 
in  the  diagram  simply  due  to  a  paucity  of  knowledge  in  modelling  the  underlying 
physical  process.  There  is  no  guarantee  that  the  arcs  will  ever  be  traversed.  [See  also 
the  section  on  adversaries  (§1.3)]. 

In  order  to  understand  the  difference  between  the  non-deterministic  and  the 
probabilistic  case  consider  figure  3.7.  In  both  Part  A  and  Part  B,  if  one  interprets 
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Figure  3.7:  Both  tasks  satisfy  the  connectivity  assumption  whenever  the  arcs  have 
positive  probabilities  of  being  executed.  However,  only  the  task  of  Part  B  satisfies 
the  connectivity  assumption  if  the  arcs  are  interpreted  as  worsv-case  non-deterministic 
transitions. 
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all  the  arcs  as  probabilistic  arcs  with  positive  transition  probabilities  then  the  task 
satisfies  the  connectivity  assumption.  In  other  words,  from  any  state  there  is  a 
sequence  of  transitions  that  attains  the  goal  with  non-zero  probability.  However, 
if  we  interpret  the  arcs  as  worst-case  transitions,  then  only  the  task  of  Part  B  satisfies 
the  connectivity  assumption.  In  Part  A,  from  a  worst-case  point  of  view,  there  is  a 
possibility  that  the  system  will  forever  loop  between  states  si  and  s3. 

Let  us  formalize  the  connectivity  assumption.  As  we  stated  on  page  112,  even 
in  the  worst  case  there  should  for  each  state  exist  a  sequence  of  actions  that  leads 
to  the  goal.  Recall  that  a  non-deterministic  action  A  can  cause  a  given  state  s  to 
transit  non-deterministically  to  any  one  of  a  set  of  states  {sj,  •  •  •  ,s*}.  There  is  no 
further  information  in  the  model,  and  one  must  thus  be  prepared  that  any  one  of  the 
transitions  can  occur.  That  is  what  is  meant  by  a  worst-case  model.  We  will  refer  to 
an  instantiation  of  such  a  non-deterministic  transition  as  a  particular  choice  s,.  In 
other  words,  on  a  particular  execution  of  action  A  while  the  system  is  in  state  s,  the 
result  is  instantiated  as  state  s,.  By  an  instantiation  of  all  possible  actions  we  mean 
a  choice  s,  for  all  actions  A  6  A  at  all  possible  states  s  E  •S’.  An  instantiation  of  all 
possible  actions  yields  a  directed  graph  whose  vertex  set  is  S  and  whose  arcs  are  the 
directed  arcs  defined  by  the  instantiation.  We  will  refer  to  a  particular  such  graph 
as  an  instantiated  transition  graph.  Figure  3.8  shows  the  four  instantiated  transition 
graphs  that  are  possible  by  instantiating  in  all  possible  ways  the  non-deterministic 
actions  of  the  graph  in  Part  A  of  figure  3.7.  Notice  that  for  one  of  the  graphs,  two 
states  are  disconnected  from  the  goal.  This  »ays  that  in  a  worst-case  scenario  it 
might  not  be  possible  to  reach  the  goal.  As  we  shall  see,  this  also  says  that  there  is 
no  perfect-sensing  strategy  for  attaining  the  goal  from  an  arbitrary  state. 

Definition.  We  will  say  that  it  is  certainly  possible  to  reach  a  set  of  goal  states  Q 
from  a  given  state  s  if  for  any  instantiated  transition  graph  there  is  some  path  that 
leads  from  the  state  s  to  some  goal  state  in  Q. 

This  definition  captures  the  notion  that  no  matter  how  the  world  behaves  within 
the  non-determinism  allowed  by  the  specified  actions,  there  is  some  path  for  attaining 
the  goal.  The  definition  says  nothing  about  whether  the  system  can  actually  compute 
that  path  or  execute  it.  After  all,  the  system  is  not  necessarily  aware  of  the  actu<J 
instantiations  of  the  actions  it  executes.  The  connectivity  assumption  merely  says 
that  it  is  “certainly  possible”  to  attain  the  goal,  that  is,  that  no  adversary  can  prevent 
it  for  certain. 

Looking  ahead  slightly,  this  connectivity  assumption  facilitates  the  use  of 
randomized  strategies.  This  is  because  a  system  can  randomly  guess  what  the 
instantiated  graph  looks  like.  Having  made  its  guess,  the  system  can  execute  a 
sequence  of  actions  that  follows  a  path  to  the  goal.  If  the  system  guessed  correctly, 
then  these  actions  attain  the  goal.  Otherwise,  the  system  fails  to  attain  the  goal,  but 
can  try  again.  The  connectivity  assumption  ensures  that  on  each  guess  there  is  a 
non-zero  probability  of  guessing  correctly,  uniformly  bounded  away  from  zero.  Thus, 
eventually,  the  system  will  guess  correctly. 
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Figure  3.8:  These  four  instantiated  transition  graphs  describe  the  different  possible 
worst-case  scenarios  for  the  task  of  Part  A  of  figure  3.7.  The  absence  of  a  path  to 
the  goal  for  states  si  and  s3  in  the  fourth  graph  indicates  that  it  is  not  “certainly 
possible”  to  reach  the  goal  from  an  arbitrary  state  in  the  state  space. 
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In  fact,  it  turns  out  that  the  connectivity  assumption  in  the  non-deterministic 
setting  is  equivalent  to  the  existence  of  a  guaranteed  perfect-sensing  strategy.  This 
is  proved  below.  Furthermore,  the  perfect- sensing  strategy  need  not  have  more  steps 
than  there  are  states. 

Connectivity  Tests 

Thus  in  both  cases  we  have  a  simple  test  for  verifying  goal  connectivity.  In  the 
probabilistic  case  the  test  involves  computing  transitive  closures.  In  the  non- 
deterministic  case  the  test  consists  of  searching  for  a  perfect-sensing  strategy.  This 
may  be  done  quickly,  using  dynamic  programming,  as  explained  in  section  3.2.4. 
Notice  that  the  probabilistic  test  need  not  yield  an  optimal  strategy,  and  the  non- 
deterministic  test  need  not  yield  a  guaranteed  strategy  for  an  arbitrary  sensing 
function.  The  probabilistic  test  merely  yields  some  strategy,  while  the  non- 
deterministic  test  yields  a  guaranteed  strategy  given  a  perfect  sensor.  The  tests, 
and  hence  the  assumption,  are  definitely  weaker  than  the  general  planning  problem 
itself. 


Goal  Reachability  and  Perfect  Sensing 

And  now  the  two  claims.  The  first  establishes  the  equivalence  between  goal 
reachability  and  perfect-sensing  strategies,  the  second  shows  that  a  guaranteed 
strategy  under  perfect  sensing  requires  few  steps. 

Claim  3.4  Let  (S,A,z.,G)  be  a  discrete  planning  problem,  where  S  is  the  set  of 
states.  A  is  the  set  of  actions,  E  is  the  sensing  function,  and  Q  is  the  set  of  goal 
states. 

It  is  '‘certainly  possiblev  to  reach  Q  from  any  state  s  G  S  if  and  only  if  there  exists 
a  guaranteed  perfect-sensing  strategy  for  attaining  Q  from  any  state  s  £  S . 

Proof.  First,  suppose  that  there  exists  a  perfect-sensing  strategy  that  is  guaranteed 
to  move  the  system  from  any  state  s  to  some  goal  state.  Then  for  any  instantiated 
transition  graph  there  must  be  a  path  from  s  to  <?.  This  path  may  be  determined  by 
executing  the  perfect-sensing  strategy  while  selecting  action  transitions  as  prescribed 
by  the  instantiated  transition  graph. 

Conversely,  suppose  that  for  any  instantiated  transition  graph  and  any  state  s  €  S 
there  is  a  path  from  s  to  Q.  We  would  like  to  exhibit  a  perfect-sensing  strategy  for 
attaining  the  goal  Q  from  any  state  s  €  5. 

We  will  construct  a  collection  of  sets  of  states  S0,...Sq,  for  some  q  <  |S|.  The 
intuition  behind  these  sets  is  that  a  state  is  in  S*  if  there  exists  a  perfect-sensing 
strategy  for  attaining  the  goal  in  at  most  i  steps,  and  if  there  is  some  possible 
instantiated  transition  graph  for  which  i  steps  are  actually  required.  We  will  not 
actually  require  this  property  in  the  current  proof.  However  it  provides  the  proper 
intuition,  and  it  will  reappear  in  the  proofs  of  claims  3.5  and  3.12. 


3.2.  BASIC  DEFINITIONS 
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Define  So  to  be  the  goal  set  Q.  Clearly  there  is  a  perfect- sensing  strategy  that 
attains  a  goal  state  from  any  state  in  So,  requiring  zero  steps.  Suppose  that  Sk  has 
been  defined,  and  that  there  exists  a  perfect-sensing  strategy  defined  on  the  union 
U*=0  <£«••  The  perfect-sensing  strategy  is  assumed  to  attain  a  goal  state  from  any  state 
in  the  union  without  ever  passing  through  any  state  in  the  complement  S  —  U?=o 
Define  Sk+t  to  be  the  set  of  all  states  in  this  complement  for  which  there  exists  some 
action  that  attains  a  state  in  Uf=o  <5,  in  a  single  step.  In  other  words, 


Fa(s)  C  jj  S{  for  some  action  A  =  A(s)  >. 

i=0  J 


We  need  to  show  that  Sk+x  is  not  empty,  unless  S  =  (J*L0  <5,.  Once  we  establish 
this,  then  the  existence  of  a  perfect-sensing  strategy  on  the  union  Uf=o  will  be 
clear.  In  particular,  this  new  perfect-sensing  strategy  is  an  extended  version  of  the 
previous  strategy.  It  executes  the  same  actions  as  before  for  states  in  U*=o  while 
executing  the  actions  /l(s)  for  each  s  €  Sk+^.  Clearly  this  strategy  attains  a  goal 
state  from  any  state  in  the  union  Uf=o  w’ithout  ever  passing  through  states  in  the 
complement  of  this  union. 

Furthermore,  since  each  set  S ,  is  non-empty,  there  can  be  at  most  |5|  of  them. 

Now  let  us  show  that  Sk+i  is  indeed  non-empty.  Let  us  write  Ck  =  S  —  ULo 
Suppose  that  Sk+I  =  0.  but  that  Ck  ^  0.  This  says  that  for  every  state  s  £  Ck  and 
every  action  ,4,  the  intersection  of  the  forward  projection  F^js)  with  Ck  is  non-empty. 
Said  differently,  for  each  state  s  £  Ck,  and  each  action  A,  there  is  an  instantiation 
that  causes  s  to  traverse  to  a  state  in  Ck.  This  means  that  there  is  an  instantiated 
transition  graph  for  which  the  set  Ck  is  completely  disconnected  from  the  goal.  That 
violates  the  assumption  of  the  claim,  and  thus  we  see  that  Sk+\  ^0.  | 


The  next  claim  establishes  that  a  perfect-sensing  strategy  for  attaining  a  goal  need 
not  be  very  long.  The  claim  actually  follows  from  the  proof  of  the  previous  claim. 
However,  for  completeness  we  will  prove  it  independently. 


Claim  3.5  Let  (S.A,  E,<y)  be  a  discrete  planning  problem,  with  E  being  a  perfect¬ 
sensing  function.  Suppose  that  there  exists  a  guaranteed  strategy  for  moving  from  some 
start  state  s  to  the  goal  set  Q .  Then  this  strategy  requires  no  more  than  r  =  |<5|  —  \Q\ 
steps. 


Proof.  This  is  a  standard  finite  automaton  argument.  Suppose  that  more  than 
r  steps  are  required.  Consider  a  possible  trace  of  states  that  occur  as  the  strategy 
is  executed.  This  trace  must  then  contain  a  subsequence  of  non-goal  states  in  which 
the  first  and  last  state  are  the  same  state,  say  state  i.  Let  A\  be  the  action  executed 
when  the  system  is  first  in  state  5,  and  let  action  Ai  be  the  action  executed  when  the 
system  encounters  state  s  at  the  end  of  this  subsequence.  Since  sensing  is  perfect, 
the  strategy  will  continue  to  be  successful  if  action  A\  is  replaced  by  action  Ai-  This 
change  removes  the  subsequence  from  the  trace,  thus  shortening  this  particular  trace 
by  at  least  one  step.  Repeatedly  applying  this  procedure  to  all  possible  traces  shows 
that  the  strategy  need  not  require  more  than  r  steps.  1 
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3.3  Perspective 

We  noted  in  section  3.2.3  that  the  general  problem  of  planning  optimal  strategies  on 
discrete  spaces  is  very  hard  computationally.  This  suggests  several  different  directions 
to  take.  One  is  to  give  up  the  notion  of  optimality.  Another  is  to  examine  special 
cases,  and  to  try  to  understand  the  characteristics  that  permit  fast  solutions.  The 
next  few  sections  will  address  these  issues  in  the  probabilistic  setting. 

Finally,  as  indicated  earlier,  for  many  problems  the  action  transitions  are  not 
probabilistic  but  rather  non-deterministic.  In  these  situations,  the  Markov  Decision 
model  is  not  directlv  applicable.  The  approach  for  several  years  has  been  to 
compute  what  are  often  known  as  guaranteed  strategies.  These  are  strategies  that  are 
guaranteed  to  attain  a  goal  state  in  a  fixed  number  of  steps,  despite  uncertainty.  The 
strategies  are  computed  by  backchaining.  In  the  perfect-sensing  case,  this  amounts  to 
using  dynamic  programming,  with  a  value  function  that  can  only  take  on  the  boolean 
values  0  and  1.  However,  not  all  problems  admit  to  guaranteed  solutions.  The  latter 
sections  of  the  chapter  will  look  at  how  randomization  may  be  used  to  solve  some  of 
these  problems. 


3.4  One-Dimensional  Random  Walk 

In  studying  randomized  strategies  on  discrete  spaces,  it  is  worthwhile  to  start  by 
considering  some  very  simple  problems,  such  as  the  one-dimensional  random  walk.  It 
turns  out  that  the  insight  into  convergence  speeds  that  one  gains  from  looking  at  a 
one-dimensional  setting  carries  over  to  some  extent  into  the  general  setting. 


3.4.1  Two-State  Task 

The  simplest  possible  non-trivial  example  is  given  by  a  system  consisting  of  two 
states,  with  a  probabilistic  transition  between  these  states.  This  was  essentially 
the  representation  in  the  gear-meshing  and  parts-sieving  examples  earlier.  For 
completeness,  let  us  quickly  review  the  results  of  the  earlier  discussion.  Let  us  say 
that  one  of  the  states  is  the  start  state,  and  the  other  is  the  goal  state,  and  that 
sensing  is  perfect.  This  means  that  whenever  the  system  is  in  a  state  s,  the  sensor 
accurately  reports  that  the  system  is  in  state  s.  It  the  probability  of  transiting  from 
the  start  state  to  the  goal  state  is  p,  then  the  expected  time  until  the  goa'  is  attained 
is  1/p.  Indeed,  this  is  a  classic  waiting  time  problem:  the  probability  that  the  goal  is 
attained  on  the  k,h  try  is  pqk~l,  whereg  =  1— p.  In  particular,  for  fixed  p,  convergence 
is  exponentially  fast  in  the  number  of  tries.  [This  is  also  known  as  linear  convergence 
or  geometric  convergence,  since  the  ratio  of  successive  error  terms  is  bounded  by  a 
constant  less  than  one.] 

A  slightly  more  complicated  problem  is  given  if  the  sensing  function  is  not  perfect. 
Different  variations  are  possible.  On.  possibility  is  that  sometimes  the  sensor  will 
correctly  register  the  state  of  the  system,  while  at  other  times  the  sensor  cannot 
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Figure  3.9:  A  two  state  Markov  chain  with  probabilistic  transitions.  The  chain 
represents  an  approximation  to  the  gear-meshing  and  sieving  examples  of  section 
1.2,  given  that  the  action  to  be  executed  is  fixed. 


distinguish  between  the  two  states.  If  pscnae  is  the  probability  of  recognizing  that 
the  system  is  in  the  goal,  then  the  probability  of  entering  and  recognizing  entry  into 
the  goal  is  ppJen>e,  assuming  independence  between  the  actions  and  the  sensors.  This 
raises  the  expected  execution  time  by  at  most  a  factor  of  1  /p,cn,e-  Another  possibility 
is  that  the  sensing  function  can  never  distinguish  between  the  start  and  the  goal  when 
the  system  is  at  the  goal.  In  this  case  one  cannot  guarantee  that  the  goal  will  be 
attained,  but  one  may  be  able  to  say  something  about  the  probability  of  attaining 
the  goal  in  a  specific  number  of  steps. 

At  issue  is  what  happens  to  the  system  if  it  is  in  the  goal  and  one  executes  the 
action  designed  to  move  to  the  goal.  Specifically,  one  is  interested  whether  the  system 
stays  in  the  goal  or  whether  it  can  j  imp  back  out  of  the  goal.  In  the  gear-meshing 
example  the  question  amounts  to  deciding  whether  spinning  the  gears  when  they  are 
meshed  can  cause  them  to  disengage  In  the  sieving  example,  the  question  amounts  to 
deciding  whether  shaxing  the  system  after  the  object  has  fallen  through  the  sieve  can 
cause  it  jump  back  up  above  the  sieve.  See  figure  3.9  for  a  probabilistic  description. 
The  probability  of  moving  out  of  the  goal  is  given  by  u.  This  is  zero  if  the  system 
remains  forever  in  the  goal  under  the  action  A,  and  non-zero  otherwise. 

The  ideal  situation  is  that  u  is  zero,  in  other  words,  that  the  goal  is  not  ever  exited 
once  attained.  In  this  case,  as  we  mentioned  above,  the  probability  of  not  attaining 
the  goal  in  k  attempts  is  qk ,  where  q  =  1  —  p  is  the  probability  that  the  system 
stays  in  the  start  state  when  action  A  is  executed.  So,  if  one  wants  the  probability  of 
failure  to  be  less  than  some  constant  e,  then  one  should  choose  k  to  be  bigger  than 

log  e/log  q- 

The  worst  case  occurs  when  u  is  one,  that  is,  when  the  goal  it  immediately  exited 
after  having  been  entered.  Define  p*  to  be  the  probability  that  the  system  is  in  the 
goal  on  the  kth  try,  and  c*  to  be  the  probability  that  the  system  is  not  in  the  goal. 


120 


CHAPTER  3.  RANDOMIZATION  IN  DISCRETE  SPACES 


Then  the  following  system  of  equations  holds: 


Qk+ 1  —  <?9*  +  Pjfc) 

Pk+ 1  -  p(jk, 


with  boundary  conditions 


%  =  1,  Po  =  0. 

Since  qk  +  Pk  =  1.  we  see  that  the  strategy  which  attains  the  goal  with  highest 
probability  consists  of  a  single  attempt.  In  order  to  see  this,  notice  that  p0  =  0  and 
that  Pi  =  p.  Observe  further  that  pk  >  0  for  all  k  >  0.  Thus  qk  <  1  for  all  k  >  0. 
This  in  turn  says  that  pk  <  p  for  all  k  >  1.  In  other  words,  after  the  first  trial  the 
probability  of  success  decreases. 

If  one  does  repeatedly  try  to  attain  the  goal,  so  that  k  becomes  very  large,  then 
qk  and  pk  approach  a  limiting  distribution,  namely 


9*  i+p’ 

P*  = 

Let  us  briefly  consider  the  general  case.  All  this  material  is  standard  in  the  theory 
of  Markov  chains.  See.  for  instance,  [Fellerl]  and  [KTl]  for  further  introductions. 
Denote  the  start  state  as  state  1  and  the  goal  as  state  2.  Let  P  =  (p,;)  be  the 
probability  transition  matrix,  where  pX}  is  the  probability  of  transiting  from  state  i 
to  state  j  in  a  single  step.  We  have  that 

r=('-p  /  ). 

The  ktfl  power  of  this  matrix,  Pfc  describes  the  fc-step  transition  probabilities. 
If  the  row  vector  x0  describes  the  initial  probability  distribution  over  the  system 
states,  then  rk  =  x0Pfc  describes  the  resulting  probability  distribution  after  k  steps. 
In  our  case  x0  =  (1,0),  meaning  that  the  system  starts  off  in  state  1.  The  theory 
of  Markov  chains  tells  us  that  as  the  number  of  steps  gets  large  xk  approaches  a 
limiting  distribution  x,  which  is  a  left  eigenvector  of  the  matrix  P,  with  eigenvalue 
1.  Furthermore,  under  fairly  simple  conditions  (such  as  non-periodicity),  the  chain 
converges  to  this  distribution  at  the  rate  Xk ,  where  A  is  the  largest  eigenvalue  whose 
norm  is  less  than  one  (all  eigenvalues  have  norm  no  more  than  one).  So  convergence 
is  exponentially  fast  in  the  number  of  steps  taken.  It  is  clear  that  the  vector 
x  =  (u/(u  +  p),p/(u  +  p))  is  a  left  eigenvector  of  the  matrix  P,  with  eigenvalue  1.  Thus 
x  forms  the  limiting  distribution  as  one  repeatedly  executes  action  A.  Furthermore, 
the  eigenvalue  other  than  1  is  A  =  I  —  p  —  u,  and  convergence  occurs  geometrically 
fast,  with  A  as  base.  Indeed,  if  we  write  the  difference  at  any  point  in  time  between 
the  limiting  distribution  and  the  current  distribution  as  e*  =  x  —  rk,  then  e*  is  of  the 
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form  (c,  — e),  for  some  e  between  —1  and  1.  Furthermore,  ek+\  =  e*  P,  by  definition  of 
the  limiting  distribution.  Performing  this  multiplication,  we  see  that  efc+i  =  Ae*,  as 
one  would  hope.  This  also  shows  that  the  strategy  which  maximizes  the  probability 
of  attaining  the  goal,  given  that  one  starts  out  in  the  start  state,  is  given  by  a  single 
application  of  action  A.  Further  applications  of  A  only  reduce  the  probability  of  being 
in  the  goal,  from  p  eventually  down  to  the  stable  distribution  value  of  p/(u  +  p).  Of 
course,  if  one  isn’t  sure  whether  the  system  initially  starts  in  the  (so-called)  start 
state  or  in  the  goal  state,  then  a  single  application  of  A  may  not  be  the  right  thing. 

If  we  apply  this  analysis  to  the  case  that  u  is  zero,  we  see  that  the  system  has 
eigenvalues  1  and  q,  and  a  limiting  distribution  of  t  =  (0,1).  This  says  that  the 
goal  is  eventually  attained,  and  that  convergence  is  geometric  with  base  q,  agreeing 
with  our  earlier  calculations.  In  the  case  that  u  is  one,  the  eigenvalues  are  1  and 
— p.  Again,  the  limiting  distribution  is  r  =  (1/(1  +  p),p/(  1  +  p)),  as  we  saw  earlier, 
and  convergence  to  this  distribution  is  geometric  with  base  — p.  The  negative  sign 
indicates  oscillatory  behavior  of  the  error  vector. 

For  a  given  action  we  now  have  a  means  of  computing  the  probability  of  winding 
up  in  the  goal  on  any  given  step.  Or,  more  generally,  without  knowing  anything 
about  the  initial  distribution  that  determines  the  state  of  the  system,  we  can  say 
that  after  sufficiently  many  applications  of  action  A  the  system  will  attain  a  stable 
distribution.  In  particular,  after  sufficiently  many  steps  the  goal  will  be  attained  with 
probability  close  to  p/(u  -fp).  While  this  is  a  far  cry  from  guaranteeing  that  the  goal 
will  be  attained,  it  is  considerably  better  than  claiming  that  the  task  is  not  doable 
in  the  absence  of  a  guaranteed  strategy.  In  particular,  if  the  goal  represents  the 
preconditions  to  some  other  task,  then  one  has  a  means  of  at  least  probabilistically 
meeting  those  preconditions,  and  of  passing  on  a  probability  of  their  having  been 
met  to  the  next  task.  Said  differently,  one  can  think  of  the  repeated  execution  of 
action  A  as  randomizing  between  two  states,  of  which  only  one  permits  solution  of 
some  additional  task.  With  good  sensing  the  randomization  is  not  needed,  but  with 
no  sensing,  the  randomization  offers  a  means  of  solving  the  task  without  knowing 
whether  the  system  first  starts  off  in  the  goal  or  in  the  (so-called)  start  state. 

Suppose  that  several  different  actions  are  possible.  Then  this  analysis  provides  a 
means  for  comparing  the  actions  in  terms  of  their  probabilities  of  success  or  in  terms 
of  their  convergence  times. 

Furthermore,  the  approach  just  outlined  applies  to  a  general  Markov  chain.  The 
size  of  the  matrices  changes,  but  the  comments  regarding  limiting  distributions  and 
convergence  times  continue  to  hold.  We  can  thus  imagine  analyzing  and  comparing 
different  strategies  for  solving  a  sensorless  task  formulated  as  a  probabilistic  problem 
on  a  discrete  state  space. 


3.4.2  Random  Walks 

In  order  to  motivate  the  analysis  of  random  walks,  consider  the  task  of  moving  a  peg 
into  a  hole.  Suppose  that  we  are  interested  in  generating  a  simple  feedback  loop,  that 
senses  the  position  of  the  peg  relative  to  the  hole,  then  moves  the  peg  to  decrease  the 
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Figure  3.10:  This  is  a  deterministic  random  walk.  The  system  moves  towards  the  left 
one  state  during  each  time  step. 


distance  from  the  hole.  In  the  case  of  perfect  control  and  perfect  sensing,  the  peg  will 
always  move  towards  the  hole.  However,  if  sensing  and  control  are  subject  to  error, 
then  the  sensors  may  occasionally  suggest  the  wrong  direction  in  which  to  move, 
and  the  motions  executed  may  occasionally  move  in  the  wrong  direction  or  perhaps 
accidenfally  slide  over  the  hole.  Recall  the  physical  peg-in-hole  example  of  section 
1.1  and  the  analysis  of  section  2.4.  In  other  words,  the  motion  at  any  point  is  not 
guaranteed  to  move  towards  the  hole,  but  has  some  chance  of  moving  in  a  different 
direction.  This  sets  the  stage  naturally  for  processes  that  may  be  approximated  by 
random  walks.  These  are  in  general  multi-dimensional,  but  often  it  is  enough  to 
consider  some  one-dimensional  quantity,  such  as  the  distance  of  the  peg  from  the 
hole.  A  more  direct  example  is  given  by  a  two-dimensional  peg-in-hole  problem,  in 
which  the  peg  is  moving  on  a  one-dimensional  edge  near  the  mouth  of  the  hole. 

Another  motivation  for  studying  random  walks  is  given  by  the  sensorless  tasks 
discussed  in  section  1.4.  Here  the  question  may  be  one  of  choosing  a  sequence 
of  probabiUstic  actions  that  should  attain  some  desired  goal.  The  choice  may  be 
deterministic,  so  that  the  random  character  of  the  system  arises  solely  from  the 
probabilistic  actions,  or  the  choice  of  actions  may  itself  involve  random  decisions 
at  execution  time. 

In  summary,  random  walks  on  graphs  arise  naturally  due  to  uncertain  sensing, 
uncertain  control,  and  purposeful  randomization.  We  are  interested  in  this  section 
in  determining  convergence  properties  of  one-dimensional  random  walks.  An 
understanding  of  these  properties  will  aid  in  constructing  strategies  for  more  general 
tasks. 

Figure  3.10  shows  a  simple  one-dimensional  random  walk.  The  state  space  consists 
of  a  + 1  states,  labelled  0, 1, . . . ,  a.  The  arrows  emanating  from  each  state  indicate  the 
possible  transitions  out  of  that  state  at  any  given  step  of  the  process.  The  arrows  are 
labelled  with  the  probability  of  their  occurrence.  State  0  is  the  goal.  This  is  actually 
a  deterministic  random  walk:  At  each  step,  if  the  process  is  in  state  k,  then  it  will 
transit  to  state  k  —  1  with  probability  one.  Once  the  process  has  entered  state  0, 
it  remains  there.  In  short,  for  this  deterministic  random  walk  the  expected  time  to 
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Figure  3.11:  This  is  a  random  walk,  in  which  the  system  moves  left  with  probability 
p,  and  sits  still  with  probability  <7=1  —  p. 


reach  the  goal  from  state  k  is  just  k.  the  distance  to  the  origin.  This  is  the  type  of 
behavior  one  evpects  with  perfect  control  or  sensing. 

A  slight  variation  is  given  by  the  random  walk  in  figure  3.11.  In  this  example  the 
transition  from  state  k  to  state  k  —  1  only  has  probability  p,  while  with  probability 
q  =  1  —  p  the  process  remains  in  state  k.  An  example  of  such  a  process  might  be  a 
series  of  sieves  stacked  one  above  the  other  (recall  section  1.2).  Once  the  object  has 
passed  through  one  sieve,  it  will  not  move  back  up,  but  it  need  not  immediately  pass 
through  the  next  sieve.  Another  example  mentioned  earlier  was  the  task  of  closing  a 
desk  drawer  that  is  slightly  wedged.  In  many  cases  it  may  be  enough  to  keep  trying 
to  push  the  drawer  shut,  without  ever  having  to  pull  it  out.  The  probability  p  models 
the  probability  of  selecting  a  pushing  force  that  actually  closes  the  drawer  further. 

The  expected  convergence  time  for  the  process  is  now  k/p,  if  it  initially  starts  in 
state  k.  One  sees  therefore  that  the  transition  probability  acts  almost  like  a  velocity. 
In  this  example  the  velocity  is  p;  in  the  previous  example  it  was  1.  Later  we  will 
generalize  this  notion  of  velocity  to  a  more  encompassing  setting. 

One  final  comment  concerns  the  search  for  paths  to  the  goal.  If  one  simply 
employed  a  connectivity  analysis,  one  would  see  that  the  goal  is  reachable  from  state 
k  by  a  sequence  of  length  k.  The  probability  that  this  precise  sequence  will  actually 
be  executed  is  pk,  which  suggests  horrible  convergence  times.  Fortunately,  however, 
because  progress  along  the  chain  cannot  be  arbitrarily  undone,  the  actual  convergence 
times  are  much  faster. 

We  will  now  derive  the  convergence  times  of  a  fairly  general  random-walk.  For 
the  most  part,  we  will  follow  [Fellerl]  in  this  analysis  (see  in  particular  pp.  348-349), 
although  our  boundary  conditions  are  slightly  different.  As  usual,  transitions  are 
possible  only  to  neighbor  states,  and  we  will  assume  that  the  probabilities  are  the 
same  for  all  interior  states.  In  particular,  p  is  the  probability  of  moving  left,  and 
q  =  1  —  p  is  the  probability  of  moving  right.  We  are  not  considering  self-transition 
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Figure  3.12:  This  is  a  random  walk,  in  which  the  system  moves  left  with  probability 
p ,  and  right  with  probability  q  —  1  —  p.  The  random  walk  stops  in  state  0  and  reflects 
at  state  a. 


probabilities.  If  these  are  included  then  the  results  are  nearly  identical.  In  the  case 
that  p  =  q  the  expected  durations  are  scaled  by  1/  (p  +  q)  from  those  given  here.  In 
the  asymmetric  case,  the  results  are  identical  to  those  given  here,  except  that  q  and 
p  no  longer  add  to  one.  We  assume  further  that  the  process  stops  in  state  0,  and 
reflects  at  state  a.  In  other  words,  instead  of  moving  right  with  probability  q  from 
state  a,  the  process  simply  stays  in  state  a  with  probability  q.  See  figure  3.12. 

Now  let  Dk  be  the  expected  time  to  reach  the  goal  (state  0),  given  that  the  system 
starts  in  state  k ,  with  0  <  k  <  a.  Suppose  the  system  starts  in  state  k  with  k  <  a, 
and  consider  the  results  of  its  first  step.  With  probability  p  the  system  will  move  to 
the  left,  at  which  point  the  remaining  expected  time  is  Dk~\,  and  with  probability  q 
the  system  moves  to  the  right,  whereupon  the  remaining  expected  time  to  reach  the 
goal  is  Dk+i ■  This  establishes  the  following  difference  equation.2 

(3.5)  Dk  =  q  Dk+1  +  p  Dk-i  +  1,  0  <  k  <  a, 

with  boundary  conditions 

(3.6)  Do  =  0,  Da  =  q  Da  +  p  Z?0_i  +  1. 

Let  us  first  suppose  that  p  ^  q.  Then  a  general  solution  to  equation  (3.5)  is  given 
by 

k  (  \k 

(3.7)  Dk  = - +  A  +  B  I  -  j  ,  when  p  ^  q, 

p  —  q  \q  I  r  7-  'll 

where  A  and  B  are  arbitrary  constants.  In  our  case,  these  are  determined  by  the 
boundary  conditions  (3.6).  In  particular, 

2These  equations  follow  Feller,  but  with  different  boundary  conditions. 
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D0  =  0  =»  A  +  B  =  0, 

and 

Da  =  q  Da  +  p  jD„_i  +  1 
=>  Da  =  Z?a_i  + 

which  says  that 

a  ( v\  a  —  1  /  p  \  1 

- +  A+B  -  = - +  A  +  B  -  +  -. 

p-<7  \qj  p-q  \qJ  p 

It  follows  that 


B  =  — 


(p-q)2 


from  which  we  see  that  the  solution  to  (3.5)  and  (3.6)  is  given  by 

(3'8)  Di  =  +  (?) 

It  is  useful  to  rewrite  this  solution  as 


(3.9) 


Dk 


k  +  q  ( 

p-q  (p-q)2  \p) 


Suppose  now  that  p 
origin).  Then  the  factor 


>  q  (so,  in  some  sense,  the 
1  —  ( p/q)k  is  negative.  So 


“natural  drift”  is  towards  the 


Dk  <  — - — ,  0  <  k  <  a, 

p-q 

and  we  see  that  convergence  is  essentially  linear  in  the  distance  from  the  origin.  In 
fact,  if  a  is  large  and  k  •<  a,  then  Dk  «  k/(p  —  q). 

Now,  suppose  that  q  >  p,  so  the  “natural  drift”  is  away  from  the  goal.  This  time 
the  factor  1  —  (p/q)k  is  positive,  and  the  factor  ( q/p)a  becomes  significant.  Indeed, 
for  large  a  (and  moderate  to  large  k),  the  expected  durations  are  essentially 


Dk  ~ 


q 

(p  -  q)2 


a 


In  other  words,  convergence  is  exponential  in  the  length  of  the  random  walk.  For 
small  k,  this  time  is  reduced  slightly,  but  it  is  still  of  the  same  order. 

Finally,  let  us  consider  the  case  for  which  p  =  q  =  1/2.  Then  the  general  solution 
to  the  difference  equation  (3.5)  becomes 
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Dk  =  -k2  +  A  +  Bk. 

The  first  boundary  condition  implies  that  A  is  zero,  while  the  second  boundary 
condition  says  that  Da  =  Da_ i  +  2,  from  which  we  see  that  B  =  2  a  +  1.  So,  the 
complete  solution  is 


Dk  =  k  (2  a  +  1  —  k). 

In  other  words,  the  convergence  times  are  essentially  quadratic.  In  particular,  for 
values  of  k  comparable  to  the  length  of  the  chain  a,  the  convergence  times  are 
essentially  a2,  whereas  for  smaller  values  of  k  the  convergence  times  are  on  the  order 
of  k  a. 

These  observations  establish  the  following 

Claim  3.6  Consider  a  random  walk  on  the  state  space  0,1,...,  a,  with  reflection  at 
a.  Let  p  be  the  probability  of  moving  left  one  unit,  and  let  q  =  1  —  p  be  the  probability 
of  moving  right  one  unit.  Then  the  maximum  expected  time  to  attain  the  origin  is 
linear  in  a.  quadratic  in  a.  or  exponential  in  a,  depending  on  whether  p  >  q,  p  =  q, 
or  p  <  q,  respectively. 

Furthermore,  for  a  fixed  starting  location  k,  the  expected  time  to  attain  the  origin 
from  k  approaches  k/(p  —  q)  as  a  —*  oo  if  p  >  q,  and  approaches  infinity  if  p  <  q. 

We  see  then  that  it  is  important  for  a  random  walk  to  drift  in  the  correct  direction. 
If  at  each  point  in  time  the  tendency  is  to  move  towards  the  goal,  then  the  random 
walk  behaves  very  much  like  a  deterministic  process.  Specifically,  the  expected  time 
to  reach  a  goal  is  essentially  the  distance  to  the  goal  divided  by  the  expected  velocity 
at  which  the  process  is  moving.  In  the  random  walk  case,  the  quantity  p  —  q  measures 
this  expected  velocity.  On  the  other  hand,  if  the  expected  velocity  is  pointing  in 
the  wrong  direction,  then  the  goal  will  still  be  attained  eventually  (assuming  that 
the  state  space  is  finite),  due  purely  to  randomness.  Now,  however,  the  exponential 
character  of  having  to  perform  several  operations,  each  of  which  succeeds  only  with 
some  probability,  becomes  dominant. 


3.4.  ONE-DIMENSIONAL  RANDOM  WALK 


127 


Figure  3.13:  A  bounded  two-dimensional  grid,  with  the  goal  at  the  origin. 


3.4.3  General  Random  Walks 

Thus  far  we  have  looked  only  at  random  walks  for  which  the  transition 
probabilities  are  identical  over  all  the  (interior)  states.  If  one  varies  the  probabilities, 
then  one  can  obtain  mixtures  of  the  three  types  of  random  walks  discussed  thus  far. 
For  instance,  if  some  of  the  local  velocities  point  away  from  the  origin,  whereas  most 
point  towards  the  origin  or  at  least  are  zero,  then  one  can  obtain  convergence  times 
that  are  worse  than  linear  or  quadratic,  but  do  not  yet  approach  the  exponential 
character  of  a  random  walk  for  which  all  velocities  point  away  from  the  origin. 
Examples  in  which  this  type  of  behavior  arises  naturally  are  given  by  random  walks 
in  higher  dimensions.  For  instance,  consider  the  two-dimensional  grid  of  figure  3.13. 
Consider  a  two-dimensional  random  walk  on  this  grid,  in  which  transitions  occur  only 
to  immediate  neighbor  points,  each  with  probability  1/4,  and  reflection  occurs  at  the 
boundary.  Suppose  the  origin  is  the  goal,  and  consider  the  one-dimensional  quantity 
given  by  distance  from  the  origin,  measured  as  Manhattan  distance.  For  points  off 
the  horizontal  and  vertical  axes,  two  of  the  four  possible  transitions  decrease  the 
distance  to  the  goal,  while  two  increase  the  distance  from  the  goal.  The  expected 
change  in  distance  from  the  origin  is  in  fact  zero,  that  is,  the  “drift  velocity”  relative 
to  the  origin  is  zero.  On  the  other  hand,  for  points  on  either  of  the  axes,  only  one 
transition  decreases  the  distance  from  the  origin,  while  three  increase  the  distance. 
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The  expected  change  in  distance  is  in  fact  +1/2,  that  is,  the  drift  velocity  points  away 
from  the  origin.  Fortunately  a  point  on  one  of  the  axes  has  probability  3/4  of  either 
moving  off  the  axis  or  of  moving  closer  to  the  goal.  Thus,  even  though  the  natural 
drift  on  the  axes  is  away  from  the  origin,  the  system  cannot  get  stuck  on  the  axes, 
so  that  one  does  not  see  an  exponential  convergence  time.  This  particular  mixture 
of  velocities  that  are  either  zero  or  point  away  from  the  origin  yields  a  maximum 
expected  convergence  time  that  is  on  the  order  of  a 2  log  a.  Here  the  grid  has  edge 
length  a  (see  [Montroll]).  This  is  slightly  worse  than  the  quadratic  convergence  time 
for  the  one-dimensional  random  walk  in  which  all  the  (interior)  local  velocities  were 
zero,  but  not  so  bad  as  the  case  in  which  all  the  velocities  actually  pointed  away  from 
the  goal.  In  higher  dimensions,  the  mixture  gets  slightly  worse,  so  that  on  a  grid  in 
d  dimensions  the  maximum  expected  convergence  time  is  on  the  order  of  ad,  which  is 
the  grid  size.  All  of  these  times  are  still  polynomial  in  a. 

3.4.4  Moral:  Move  Towards  the  Goal  on  Average 

In  the  previous  examples  the  natural  drift  was  either  zero  or  it  pointed  away  from 
the  origin.  In  order  to  attain  expected  velocities  that  point  towards  the  origin,  one 
needs  some  mechanism  that  naturally  skews  the  random  walk  towards  the  goal.  If  we 
think  of  the  random  walk  as  arising  from  some  underlying  mechanical  task,  then  this 
direction  must  be  given  by  either  the  mechanics  of  the  task  or  by  the  use  of  sensors. 
For  instance,  the  goal  might  physically  be  located  at  the  bottom  of  some  trough  or 
funnel.  Alternatively,  if  the  sensors  provide  enough  useful  information  then  one  may 
be  able  to  guide  the  system  towards  the  goal  on  average. 

The  moral  is  that  in  order  to  obtain  reasonable  convergence  times  for  some  task, 
one  should  try  locally  to  make  progress  on  the  average.  In  fact,  one  need  not  guarantee 
progress  at  every  location  or  at  every  moment.  However,  if  there  are  a  reasonable 
number  of  locations  for  which  progress  occurs  on  the  average,  then  convergence  will 
be  reasonably  quick.  This  view  of  the  world  is  considerably  different  from  the  one 
that  insists  on  guarantees  at  every  step. 

Given  these  observations,  the  study  of  robotics,  in  particular  the  study  of 
automating  the  solution  to  assembly  tasks,  becomes  one  of  finding  a  proper  mixture 
of  sensing,  motion,  and  randomization,  that  ensures  progress  on  the  average.  Other 
issues  include  the  definition  of  progress  itself,  plus  numerous  details  that  delineate 
the  scope  of  the  approach.  The  remainder  of  the  thesis  will  address  some  of  those 
issues. 


3.5  Expected  Progress 

The  first  issue  that  needs  to  be  addressed  is  the  definition  of  expected  velocity  in 
the  setting  of  a  general  Markov  chain.  For  the  one-dimensional  random  walk,  with 
transitions  only  to  neighbors,  this  was  fairly  straightforward,  but  we  need  a  precise 
definition  for  the  general  case.  The  second  issue  is  whether  these  so-called  velocities 
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behave  nicely,  in  particular,  whether  increasing  the  expected  velocity  towards  the  goal 
at  some  point,  reduces  the  expected  time  to  reach  the  goal.  This  is  certainly  true  in 
the  deterministic  case,  and  one  would  like  it  to  hold  as  well  for  the  probabilistic  case. 

The  basic  motivation  for  defining  a  velocity  is  to  be  able  to  discuss  the  speed 
with  which  progress  towards  the  goal  is  made.  In  turn,  this  allows  us  to  analyze 
and  compare  different  randomized  strategies.  So  let  us  suppose  that  we  have  a  finite 
state  space  with  states  so,  si, . . . , sn,  with  a  single  goal  state  so.  Let  us  assume  that 
we  are  given  a  labelling  of  these  states,  that  is,  to  each  state  s,  there  is  associated 
a  one-dimensional  number  £;  (perhaps  a  read  number).  The  idea  is  to  view  these 
labels  as  defining  a  progress  measure,  then  to  define  expected  velocities  in  terms  of 
the  expected  progress  determined  by  this  progress  measure. 

To  make  this  precise,  let  P  =  (pq)  be  the  probability  transition  matrix  for  some 
chosen  strategy  for  moving  from  non-goal  states  to  the  goal.  We  are  assuming  that 
the  task  is  formulated  in  such  a  way  that  the  effect  of  our  strategy  may  indeed  be 
described  probabilistically  at  each  step  of  execution.  Then  the  average  or  expected 
velocity  at  state  s,  is  defined  to  be 

(3.10)  «.•  =  £>;  (*>-*)• 

j=i 

The  sum  on  the  right  just  measures  the  average  displacement  from  state  s,, 
measured  in  terms  of  the  labelling,  caused  by  a  single  step  of  the  strategy.  Thinking 
of  each  step  as  being  one  unit  of  time  then  yields  the  average  velocity.  Note  that  we 
can  rewrite  equation  (3.10)  as 

(3.11)  «i  =  X>;4  -  ti- 

That  is  the  definition,  and  here  is  the  main  claim  of  this  section.  It  establishes 
the  usefulness  of  the  definition  of  expected  velocity. 

Claim  3.7  Consider  a  Markov  chain  with  states  {s,}  and  probability  transition 
matrix  (ptJ).  One  of  the  states,  say  s0,  is  a  goal  state.  By  this  we  mean  that  all  states 
eventually  transit  to  s0.  Suppose  further  that  {£<}  is  a  labelling  of  the  states  which 
is  zero  at  the  goal  state  and  positive  elsewhere.  Let  £  =  max,{f,}  be  the  maximum 
label,  and  let  v  =  max,  {v,}  be  the  maximum  expected  velocity  defined  by  this  labelling. 
Finally,  let  D  =  max, {A}  be  the  maximum  expected  time  to  reach  the  goal,  where  Di 
is  the  expected  time  to  reach  the  goal  given  that  the  system  starts  in  state  s*. 

The  claim  is  that  whenever  v  is  negative,  then 


Said  differently,  the  maximum  expected  time  to  reach  the  goal  is  bounded  by  the 
maximum  distance  to  the  goal  ( measured  by  the  labelling),  divided  by  the  minimum 
expected  velocity  of  approach  to  the  goal: 
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0  ^  max, {A} 

maxD,  <  — : — 7 - r. 

i  mm,{— Vi} 

In  fact,  for  each  state,  the  expected  time  to  reach  the  goal  is  bounded  by  the  state’s 
label  divided  by  the  minimum  expected  approach  velocity: 


"i  i  — : — ; - r- 

nun.j-t;,} 

Proof  Strategy.  The  basic  strategy  of  the  proof  is  to  first  establish  that  if  the 
expected  velocity  is  the  same  at  each  state,  then  the  expected  time  to  attain  the 
goal  is  just  the  state  label  divided  by  this  expected  velocity.  We  then  show  that  any 
Markov  chain  satisfying  the  hypotheses  of  the  claim  may  be  formally  modified  so  that 
the  expected  velocity  is  the  same  at  each  state.  [This  modification  is  purely  a  proof 
technique  and  has  nothing  to  do  with  the  underlying  physical  process.]  Finally,  we 
show  that  the  modified  Markov  chain  may  be  transformed  back  into  the  original  chain 
in  such  a  way  that  the  expected  convergence  times  decrease  or  remain  the  same.  This 
will  establish  the  claim. 


Proving  the  claim  will  require  a  little  bit  of  work,  but  it  is  intuitively  desirable 
and  clear.  The  claim  shows  that  under  suitable  conditions  a  general  Markov  chain 
behaves  very  much  as  does  the  one-dimensional  random  walk  discussed  in  section 
3.4.2.  Specifically,  if  a  randomized  strategy  can  ensure  that  on  the  average  it  decreases 
sufficiently  quickly  some  measure  of  distance  from  the  goal,  then  the  expected  time 
to  attain  the  goal  will  be  linear  in  that  measure.  From  a  planning  point  of  view  this 
suggests  two  problems:  finding  strategies  that  make  local  progress  relative  to  a  given 
progress  measure,  and  finding  useful  progress  measures. 

In  order  to  establish  the  claim,  we  will  state  and  prove  several  other  simple 
propositions.  These  will  provide  further  intuition  regarding  the  nature  of  progress 
measures  within  randomized  strategies.  First,  let  us  turn  the  problem  around.  Instead 
of  starting  with  a  labelling  of  the  state  space  and  determining  a  strategy  for  making 
progress  relative  to  the  labelling,  suppose  one  started  with  a  randomized  strategy. 
In  particular,  suppose  a  randomized  strategy  is  given  that  turns  the  state  space  into 
a  Markov  chain  that  eventually  converges  to  some  goal  state.  It  is  natural  to  ask 
whether  there  is  a  labelling  of  the  state  space  relative  to  which  the  strategy  may  be 
perceived  as  making  progress.  The  answer  is  of  course  yes.  If  one  simply  labels  the 
states  with  their  expected  times  until  success,  then  the  induced  expected  velocities 
will  all  be  —1.  Essentially  the  labelling  spreads  out  the  states  far  enough  that  the 
distance  between  them  corresponds  precisely  to  the  difference  in  expected  times  to 
reach  the  goal.  Of  course,  the  labels  may  now  be  very  large  numbers!  We  prove  this 
observation  in  the  following  claim. 

Claim  3.8  Given  a  Markov  chain  ({s,},  (p,}))  for  which  all  states  eventually  transit 
to  some  goal  state  so,  label  each  state  s,  with  D,,  the  expected  time  to  attain  the  goal 
given  that  the  system  starts  in  state  s,.  Relative  to  this  labelling  the  induced  expected 
velocities  {v,  }  are  all  —1  (for  non-goal  states). 
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Proof.  We  have  that  D,  =  Y^=oPnDj  +  1.  This  is  just  a  generalization  of  the 
argument  used  to  establish  the  convergence  times  for  random  walks  (section  3.4.2). 
Rewriting  this,  we  see  that  £"_0  Pij  Dj  —  Di  =  —1.  Interpreting  the  expected  times 
as  labels,  we  see  by  (3.11)  that  the  left-hand  side  of  this  equation  is  just  v,,  which 
establishes  the  claim.  | 

This  says  that  labellings  are  a  natural  means  of  characterizing  a  strategy’s 
behavior.  It  also  indicates  that  the  search  for  a  useful  labelling  is  futile,  since  any 
strategy  can  be  made  to  appear  to  converge  quickly  relative  to  a  suitable  labelling. 
It  is  in  fact  more  appropriate  to  view  the  situation  in  reverse.  If  one  is  interested 
in  convergence  speeds  of  a  particular  type,  then  one  should  look  at  labellings  whose 
labels  do  not  exceed  the  desired  convergence  times.  For  any  such  labelling  one  can 
then  determine  whether  a  strategy  exists  that  makes  rapid  progress.  Indeed,  in  many 
cases  a  natural  labelling  may  be  apparent,  such  as  one  given  by  the  distance  or 
distance  squared  from  some  goal. 

Finding  a  strategy  given  a  labelling  essentially  entails  choosing  the  (n  -f  l)2 
probabilities  {p,y } ,  subject  to  the  constraints  u,  <  0,  and  Yl]=oPij  =  1,  for  all 

i  =  1 . n.  If  choosing  these  probabilities  can  be  done  independently  for  each  state 

s,,  then  the  existence  of  a  fast  strategy  relative  to  a  labelling  may  be  ascertained  very 
quickly,  since  all  the  computations  and  constraints  are  local  to  each  state  st.  In  many 
cases,  however,  the  strategy  cannot  be  determined  locally.  For  instance,  the  action 
performed  in  a  given  state  will  depend  on  a  sensor  value  returned  when  the  system  is 
in  that  state.  Since  different  states  can  give  rise  to  the  same  sensor  value,  a  strategy 
based  on  sensed  values  will  necessarily  couple  the  pX]  at  different  states.  We  will  see 
the  significance  of  this  topic  later,  both  in  this  chapter  and  in  chapter  5.  Indeed  it  will 
turn  out  that  for  simple  labellings,  such  as  distance  from  the  goal,  average  progress 
cannot  always  be  guaranteed  for  every  state  in  the  system.  Instead,  one  naturally 
gets  mixtures  of  states,  some  for  which  rapid  progress  is  possible  and  some  for  which 
it  is  not,  just  as  we  did  for  the  two-dimensional  random  walk  discussed  in  section 
3.4.3. 

An  immediate  corollary  to  Claim  3.8  is  the  following. 

Corollary  3.9  If  relative  to  some  labelling  {f,},  all  the  expected  velocities  are  equal 
to  a  negative  constant  vconst,  then  the  expected  times  to  reach  the  goal  are  given  by 
Dx  —  ^i/Vcontt' 

Proof.  Using  the  expression  (3.11),  we  have  that  at  each  state  s; 


V  const  —  Vi 

=  YtPiji)  - 

3- 1 


Relative  to  a  new  labelling  {£'}  given  by  t\  =  £i/{—vcon,t),  one  observes  that: 
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— i  =  £p.^  -  C 

By  the  proof  of  claim  3.8,  it  must  therefore  be  the  case  that  l\  =  Dx  for  all  states 
sx.  (Uniqueness  of  the  Z),  follows  from  the  assumption  that  all  the  states  eventually 
transit  to  the  goal.  See  also  chapter  10  of  [KT2j.)  This  establishes  the  corollary.  | 

This  corollary  is  useful  in  conjunction  with  the  next  lemma,  which  establishes  that 
we  can  always  modify  a  finite  Markov  chain  whose  expected  velocities  are  negative 
so  that  its  expected  velocities  are  all  equal  to  some  non-zero  negative  constant.  In 
particular,  we  will  show  that  if  the  average  velocity  at  some  state  is  negative  then 
that  state’s  average  velocity  may  be  increased  (that  is,  its  absolute  value  may  be 
decreased)  by  changing  into  self-transitions  some  of  the  transitions  that  point  to 
states  with  lower  labels.  [Note  that  we  are  not  claiming  anything  about  whether  the 
underlying  physical  process  may  be  changed.]  For  a  finite  Markov  chain  with  negative 
expected  velocities  this  immediately  implies  that  the  chain  may  be  modified  so  that 
all  expected  velocities  are  some  negative  constant.  As  we  outlined  on  page  130.  this 
is  useful  as  a  proof  device  for  the  proof  of  claim  3.7. 

Lemma  3.10  Consider  a  labelled  Markov  chain  ( { s , } ,  (p,j),  {£,}).  Suppose  that  the 
expected  velocity  »;*  at  some  state  Sk  is  negative.  Let  a  satisfy  Vk  <  a  <  0.  Then  one 
can  modify  the  kth  row  of  the  probability  transition  matrix  (pxj)  so  that  the  velocity  at 
sk  becomes  a.  Furthermore,  one  need  only  increase  pkk  and  commensurately  decrease 
Pkj  for  values  of  j  for  which  f,  <  Ik- 

Proof.  Let.  Av  =  t —  a.  Then  Vk  <  At’  <  0. 

Since  Vk  is  negative,  we  have  that  I:  <  Ik  for  at  least  one  j  (see  the  definition, 
equation  (3.10)).  Furthermore,  taking  all  these  s}  together,  we  must  ha\e  that 

PkJ  M  <  Vk  <  At. 

j 

i,<tk 

For  the  purposes  of  argument  it  is  enough  to  assume  that  there  is  one  j  =  j0  for 
which  pkj0  {1} o  —  Ik)  <  At.  The  general  case  follows  readily  from  this. 

Now  define  a  new  probability  transition  matrix  (p'X] )  which  is  identical  to  ( pXJ ), 
except  for  p'kk  and  p'kj0.  Specifically,  let  p'kk  =  Pkk  +  P  and  p'k]0  =  pkj0  -  p,  where 
p  =  A t/(f*,  -  Ik)-  One  verifies  that  0  <  p  <  p*J0,  so  the  construction  makes  sense.  It 
is  easily  seen  that  the  induced  velocity  vk  equals  a,  thus  establishing  the  lemma.  | 

As  an  aside,  one  notes  that  the  lemma  holds  with  proper  modifications  for  positive 
expected  velocities,  although  this  is  less  useful  in  the  current  context. 

Now  we  need  a  lemma  that  goes  in  the  other  direction.  Specifically,  if  we  increase 
the  average  velocity  with  which  the  goal  is  approached  at  some  point,  then  we  would 
like  to  know  that  the  expected  time  to  reach  the  goal  decreases.  From  our  random 
walk  example,  and  given  the  phrasing  of  this  claim,  this  is  intuitively  clear,  but  in 
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a  general  setting  some  proof  is  required.  The  following  lemma  forms  the  core  of  our 
proof  of  Claim  3.7. 

Lemma  3.11  Consider  a  Markov  chain  with  n  + 1  states  s0,  Si, . .  • ,  sn  and  probability 
transition  matrix  (pq).  Suppose  state  so  fs  the  goal  state  (this  means  that  all  states 
eventually  transit  to  so  and  remain  there).  Let  D,  be  the  expected  time  to  reach  So 
given  ihat  the  system  starts  in  state  s,.  Now  consider  two  states  sx  and  sy  for  which 
Dx  >  Dy.  Construct  a  new  Markov  chain  on  the  same  state  space  with  a  modified 
probability  transition  matrix  (pL)  that  is  almost  identical  to  (pq).  It  differs  in  that 
p'XI  =  pxx  —  p  and  pxy  =  pxy  +  p,  where  p  is  any  number  satisfying  0  <  p  <  pxx.  If 
{£>'}  are  the  new  expected  times  to  reach  the  goal,  then  D[  <  Dt,  for  all  states. 

Furthermore,  if  p  is  non-zero,  then  D'x  <  Dx. 

Proof.  The  proof  is  long,  although  the  idea  is  simple:  Separate  the  behavior  of 
the  system  into  two  parts,  namely  what  happen  at  all  states  but  state  sr,  and  what 
happens  at  state  sx.  The  behavior  of  the  new  system  changes  only  at  sx  (although 
the  expected  convergence  times  may  change  throughout  the  system),  and  intuitively 
that  chang :  only  increases  the  probability  of  moving  closer  to  the  goal.  Thus  the 
expected  convergence  times  should  decrease.  All  this  makes  sense  if  we  th>nk  of 
expected  convergence  times  as  labellings  akin  to  distance  measures. 

And  now  for  the  details. 

let  p,  be  the  probability  that  starting  in  state  s,  the  system  reaches  state  s0 
before  it  reaches  state  sx.  This  probability  is  well-defined  for  all  states.  Also,  note 
that  g0  —  1  and  gx  =  0. 

Let  D(  be  the  expected  time  to  reach  state  sx  from  state  s,,  given  that  the  system 
reaches  sx  before  s0. 

Let  D°  be  the  expected  time  to  reach  state  s0  from  state  s,,  given  that  the  system 
does  not  pass  through  sx. 

And,  let  Df°  be  the  expected  time  to  reach  either  state  sx  or  state  .s0  from  state 
s,  before  reaching  the  other. 

One  observes  that  Df°  =  g,  D°  +  ( 1  -  g,)  D(,  and  that  D,  =  g,  D°  +  (1  —  g,)  [Df  + 
£.]■ 

Then  for  each  non-goal  state  .c,,  we  have  that 


(3.12) 

(3.13) 

(3.14) 


D,  =  1  +Y,  Pv 

3=0 

=  i  +  Epo  [p;z)"  +  (i-^)[D;  +  Dr]] 

3=0 

=  l+i>.;£f  +EP,:(1-93)DZ. 

3=0  ]= 0 


Now,  if  we  makes  changes  to  pxx  and  pxy  as  suggested,  the:,  the  expected  durations 
{D,}  will  change,  but  all  of  the  quantities  {g,}  and  {-Df  0}  will  remain  the  same.  To 
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see  this,  observe  that  when  i  ^  i,  <7,  depends  only  on  transitions  at  states  other  them 
state  sx.  None  of  these  transitions  are  affected  by  the  changes  to  pxx  and  pxy.  A 
similar  argument  applies  to  {Z)*'0}  for  i  x.  Finally,  observe  that  gx  =  0  always, 
and  thus  that  Df  =  Dxx  =  0  always. 

Let  us  write  out  equation  (3.14)  for  the  state  sx,  and  simplify  to  get  an  expression 
for  Dx  : 


Dx  =  l+±px1  Df  +  j^pX](l-g3)DT. 

1=0  J=0 

So,  solving  for  Dx , 


Dz 


1  -  U  -  9i) 

i=o 


=  1  +  P*i  Df, 

1=0 


and  thus: 


(3.15) 


1  +  Ej=o  Px,  Df 
1  —  7^i=o  Pzi  (1  ~  9 1) 


Now,  let  us  introduce  an  artifice,  by  defining  {£)'}  and  {p'^}  to  be  functions  of  p, 
where  0  <  p  <  pXI.  As  mentioned  already,  these  are  the  only  quantities  that  change. 
In  particular,  we  have  that 


[  P*x  -  if  *  =  3  =  * 

Pij(p)  =  l  Pxy  +  p,  if  i  =  x  and  j  =  y 
(  p,j,  otherwise. 


Substituting  these  changes  into  equations  (3.14)  and  (3.15),  and  noting  that  Df  —  0, 
we  have  that 


(3.16)  D[{p)  xx  1  +  ]Tp.;£>Jr,0  +  ]Tp„(l  ~9i)D'z{p),  ift^x,  i^O. 

1=0  1=0 


1+Ep.,  Df  +  (Pry  +  p)  Df  +  ( P„  -  p)  Df 
1=0 

,3i7)  ■  M]+P-’Dr+pD‘°}' 


D'Jp )  = 


f(p) 


where  (recalling  that  gx  =  0) 
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f(p)  =  1  ~  EPrj(P)(1  ~9j) 

3=0 

n 

=  1  -  P*3  (!  -  9j)  -  (Piv  +  P)  (1  -  9y)  -  (Prr  -  P) 

3=0 
3*  y 

J?X 

n 

(3.18)  =  5Zp^Pj+PPv 

j=0 

Notice  that  /(p)  is  always  positive,  in  particular,  it  is  never  zero,  so  these  equations 
make  sense.  To  see  that  /(p)  cannot  be  zero,  first  observe  that  /(p)  >  /( 0)  >  0. 
Then  observe  that  /( 0)  is  just  the  probability  that  if  the  system  starts  in  state  sx 
it  will  reach  the  goal  s0  before  reencountering  state  sx.  If  this  quantity  were  indeed 
zero,  then  the  goal  would  be  unreachable  from  state  sx,  violating  our  connectivity 
assumption  (see  also  section  3.2.7). 

In  order  to  establish  the  lemma  one  needs  to  verify  that  D'(p)  <  £)'( 0)  for  all  p  in 
the  range  0  <  p  <  pxx.  From  equation  (3.16),  it  is  clear  that  whenever  D'x(p)  <  IF  (0), 
then  D't(p)  <  D'(0)  for  all  states  s,  with  i  ^  x,  so  let  us  focus  on  showing  that 
D'x{p )  <  Dx( 0).  We  will  do  this  by  showing  that  the  derivative  of  D'z(p)  with  respect 
to  p  is  negative  for  all  relevant  p.  In  fact,  by  showing  that  this  derivative  is  strictly 
negative,  we  establish  the  strict  inequality  of  the  lemma. 

Now 


dD'x(p)  _  N(P) 
dp  (/(p))2’ 

so  it  is  enough  to  establish  that  ;Y(p)  <  0,  where  by  equations  (3.17)  and  (3.18) 


N(P)  =  D**f{p)  -\l  +  £p*;  Df  +  pD* 


z.o\  df(p) 


dp 


-  DZy’°  S  Prj  9j  +  P  9v  ~  1  +  £  ^  +  P  Dy°  9y 


=  Df 


\3= 0 
(  n 


3=0 


Xrf  Pz3  93  9y  (  f  "b  Pxj  Dj 


x,0 


\i= 0 


3=0 


The  assumption  of  the  lemma  that  Dx  >  Dy  says  that 


Dx  >  Dy 

=  9V  DQv  +  {l-gy)[Dl  +  Dx) 
=  +  (1  -  gv)  Dx. 
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(3.19)  So  gyDx  >  DTy'°. 

From  the  expression  for  Dx  given  by  equation  (3.17)  and  the  expression  for  f(p)  given 
by  equation  (3.18),  we  see  that  for  p  =  0 


DT  = 


Thus  equation  (3.19)  becomes 


EJ=o  Px ;  9] 


gy  ^  DT°j  >Dl'° 

But  this  says  precisely  that  N(p)  <  0,  thereby  establishing  the  lemma.  | 


Comments  on  Lemma  3.11 

Lemma  3.11  nearly  allows  us  to  reverse  the  process  described  by  lemma  3.10.  The 
main  difference  is  that  lemma  3.10  refers  to  states  by  their  labels,  while  lemma  3.11 
refers  to  states  by  their  expected  convergence  times.  For  a  single  state  sz  and  a  single 
state  sy,  as  in  the  statement  of  lemma  3.1 1,  this  poses  no  serious  problems.  However, 
in  order  to  prove  claim  3.7  we  will  need  to  apply  lemma  3.11  to  several  states  sx  and 
several  states  sy  simultaneously.  The  following  comments  are  intended  to  prove  that 
the  more  general  formulation  of  lemma  3.11  is  valid. 

Comment  1.  Observe  that  if  a  is  strictly  negative  in  lemma  3.10,  then  none  of  the 
probabilities  {pu}  need  to  become  zero  when  they  are  changed,  unless  they  are  zero 
already.  This  means  that  a  Markov  chain  that  satisfies  the  probabilistic  connectivity 
assumption  of  section  3.2.7  will  continue  to  satisfy  that  connectivity  condition  after 
the  modifications  of  lemma  3.10  have  been  performed.  In  other  words,  the  goal 
reachability  assumption  of  lemma  3.11  continues  to  be  satisfied.  The  purpose  of  that 
assumption  in  the  hypotheses  of  the  lemma  is  simply  to  ensure  that  the  expected 
convergence  times  are  well-defined.  In  particular,  the  theory  of  Markov  chains  tells 
us  that  the  system  of  linear  equations  relating  those  expected  convergence  times  has 
a  solution,  and  that  that  solution  is  unique. 

Comment  2.  Observe  further  that  lemma  3.11  continues  to  hold  when  the  single 
target  state  s„  is  replaced  by  a  multitude  of  such  states.  In  particular,  suppose  that 
one  is  given  a  state  sx  and  a  collection  of  states  Y  =  {-Sj,,,..  .  ,sM}  disjoint  from  sx. 
Suppose  further  that  there  exist  k  non-negative  numbers  {AVI, . . . ,  XVi  },  that  satisfy 
the  conditions 

Ay  =  1,  Dx  >  Yi, 

9y£Y  By^Y 
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Then  the  conclusion  of  lemma  3.11  continues  to  hold,  assuming  that  one  takes 
p'xy  =  pxy  +  Ayp,  for  each  state  sy  £  Y.  The  proof  of  the  lemma  goes  through  as 
before,  except  that  gv  is  replaced  by  £Ji)6k  gv  and  Dv  is  replaced  by  Z2,t€y  A VDV. 

Comment  3.  If  we  look  carefully  at  the  proof  of  lemma  3.11,  we  see  that  the 
claim  of  the  lemma  can  be  '•onsiderably  strengthened.  Recall  comment  2.  Now  let 
p  increase  from  0  to  pxx.  Let  us  examine  the  behavior  of  the  expected  convergence 
times  {Z?'}"_0.  We  have  a  three-way  case  statement: 

1.  If  Dx  >  £4y€y  A yDy,  then  D'x  decreases  strictly,  while  all  other  D[  either  remain 
constant  or  decrease. 

2.  If  Dx  <  E,yev  AyDy,  then  D'x  increases  strictly,  while  all  other  D[  either  remain 
constant  or  increase. 

3.  If  Dz  =  T.3yeY  Ay£>y,  then  all  D't  remain  constant. 

These  comments  follow  from  equation  (3.16)  and  the  computation  of  the  derivative 
dD'x/dp  on  page  135. 

Comment  4.  Finally,  suppose  that  instead  of  a  single  state  sx  one  is  given  a  set  of 
states  {sx}  in  lemma  3.11.  Assume  that  for  each  such  sx  there  is  a  collection  of  states 
Ux,  whose  weighted  expected  convergence  times  are  less  than  the  expected  convergence 
time  Dx  of  sx,  as  outlined  in  comment  2.  The  claim  is  that  if  one  simultaneously 
modifies  the  transition  probabilities  as  outlined  in  comment  2  for  each  of  the  states 
sx,  then  the  expected  convergence  times  of  the  resulting  Markov  chain  improve. 

We  would  like  to  know  that  this  generalization  of  lemma  3.11  is  correct.  Ideally, 
in  proving  this  generalization,  one  would  iteratively  apply  lemma  3.11  and  comment 
2  for  each  of  the  states  sx  in  turn,  until  all  the  modifications  suggested  had  been 
accomplished.  Unfortunately,  such  an  iterative  application  of  lemma  3.11  need  not 
be  valid.  This  is  because  the  lemma  says  little  about  the  relative  improvement  in 
expected  convergence  times  for  different  states.  Thus,  in  performing  the  modifications 
suggested  for  one  state  sXI ,  one  could  possibly  modify  the  expected  convergence  times 
of  all  the  other  states  in  such  a  way  that  the  hypotheses  of  the  lemma  no  longer  are 
satisfied  for  some  other  state  sXj.  In  particular,  it  could  happen  that 

D'XJ<  £  AyZ?;, 

•rev*, 

in  the  modified  Markov  chain.  In  that  case,  lemma  3.11  no  longer  applies.  We  thus 
need  a  slightly  more  elaborate  argument.  In  particular,  we  will  apply  lemma  3.11 
repeatedly.  However,  we  will  allow  for  the  possibility  that  the  transitions  out  of  each 
state  sX|  may  need  to  be  modified  several  times. 

Let  us  set  up  the  general  version  of  lemma  3.11,  then  offer  a  proof.  The  most 
general  version  simply  assumes  that  one  may  change  the  transition  probabilities  at 
any  non-goal  state.  Thus,  for  each  non-goal  state  s,  in  the  state  space,  suppose 
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that  we  are  given  n  +  1  numbers  {AtJ}"_0  that  are  either  all  zero  (meaning  that  no 
transitions  out  of  s,  are  to  be  changed),  or  that  satisfy  the  following  four  conditions: 


A  I,  —  0,  A  ij  ^  0,  j  —  0, . . . ,  ti, 

n 

£"=0*t;  =  l>  Di>Yt\ijD}. 

:= o 

Now,  for  each  state  s,  for  which  the  {A,>}"_0  are  not  all  zero,  let  g,  be  any  number 
satisfying  0  <  g<  <  p„.  Take  g,  to  be  zero  if  all  the  {A,_,  }"=0  are  zero.  The  probabilities 
{g,}  play  the  role  of  the  probability  p  in  the  statement  of  lemma  3.11.  Suppose 
that  one  constructs  a  new  Markov  chain  on  the  old  state  space  by  modifying  the 
probabilities  as  follows  (for  i  =  1, . . . ,  n): 

/  /  ...  n  _  /  Pit  ~  <h,  if  j  =  i 

,J'  l’  n  |  pij  +  A ijqi,  otherwise. 

[We  assume,  as  always,  that  there  are  no  transitions  out  of  the  goal.] 

Then  the  claim  is  that  the  expected  convergence  times  D[  =  ■  •  • ,  qn)  of  the 

new  Markov  chain  are  no  worse  than  those  of  the  old  chain,  for  all  legal  values  of 

<7l  i  •  •  •  1  9n  • 

Let  us  focus  only  on  those  q ,  for  which  not  all  of  the  { AtJ }”_0  are  zero,  leaving  all 
the  other  q,  fixed  at  zero.  Consider  one  such  q,  for  a  moment.  The  proof  of  lemma 
3.11.  in  conjunction  with  comment  2,  tells  us  that  for  each  state  s:  the  expected 
convergence  time  £)'( 0,  ■  ■  ■ ,  0,  g,,  0,  •  •  ■ ,  0)  improves  or  remains  the  same  as  q ,  varies 
from  0  to  pa.  However,  the  lemma  says  nothing  about  what  happens  if  we  vary  severed 
of  the  {</,}  simultaneously.  Indeed,  it  is  not  difficult  to  construct  examples  for  which 
the  expected  convergence  times  first  improve,  then  begin  to  get  worse  again,  as  the 
{g,}  are  each  in  turn  increased  from  zero  to  their  appropriate  maximum  values  (p„ 
for  qi).  However,  in  all  of  these  examples,  the  expected  convergence  times  are  always 
better  than  the  initial  expected  convergence  times  of  the  unmodified  chain.  In  other 
words,  for  all  legal  values  of  the  {g,},  and  all  states  Sj,  Dj(qi,  •  •  • ,  gn)  <  £>(0,  ■  •  •  ,0). 
We  now  outline  a  proof  of  this  fact. 

First,  some  notation.  We  will  let  the  vector  q  €  3?n  be  shorthand  for  (gx,  •  •  • ,  qn). 
Also,  will  denote  the  vector  q  for  which  each  of  the  <7,  is  at  its  maximum  legal 
value  (either  0  or  p„).  Finally,  let  D'(q)  denote  the  expected  convergence  time  for 
state  Si,  as  determined  by  q. 

We  will  now  construct  a  sequence  q0,  qi , . . . ,  qm,  such  that  qo  =  0  and  qm  =  qm*,. 
Furthermore,  any  one  element  q*  in  this  sequence  differs  from  the  previous  element 
qfc_!  in  exactly  one  coordinate.  In  other  words,  q*— q*_i  =  (0, ...  0,  A <7,,  0, . . . ,  0),  for 
some  Ag,  >  0,  representing  an  increase  in  the  value  of  g,.  The  sequence  q0, . . .  ,qm 
will  be  chosen  in  such  a  manner  that  as  q  varies  from  q*  to  q^+j  the  expected 
convergence  times  {£'(q)}”_0  all  either  decrease  or  remain  the  same.  It  follows  that 
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D'(q max)  <  Dj,  for  each  j  =  0, 1, . . .  n.  And  thus  the  general  version  of  lemma  3.11 
will  be  proven. 

In  order  to  construct  the  sequence  qo,...,qm,  define  the  functions  Z,,(q)  = 
— Z)'(q)  +  A,j£)'(q),  for  i  =  l,...,n.  Observe,  by  comment  3,  that  whenever 

Li( q)  is  negative,  then  q,  may  be  increased  up  to  its  maximum  legal  value  without 
increasing  any  of  the  expected  convergence  times  £)'(q).  Furthermore,  if  actually 
Li( q)  is  zero,  then  qx  may  be  changed  within  its  legal  limits  without  affecting  the 
expected  convergence  times  at  all. 

By  hypothesis,  all  the  {/.,}  are  negative  or  zero  at  q  =  0.  Without  loss  of 
generality,  we  may  assume  that  L\  is  negative.  Then  one  may  construct  q!  by 
changing  qt.  In  particular,  if  it  is  possible  to  change  qi  to  its  maximum  value  without 
causing  any  of  the  {Lj}  to  become  positive,  then  we  will  do  so.  Otherwise,  one 
of  the  {Lj}  must  become  zero  for  some  value  of  A qx.  In  particular,  suppose  that 
Lji  (qi)  =  0  for  qi  =  (A^,  0, . . . ,  0).  Then  we  can  next  allow  qix  to  vary  from  zero  to 
its  maximum  legal  value  (say  without  affecting  any  of  the  convergence  times. 

In  other  words,  q2  =  (A</i,  0, . . . ,  0,  p;ijl ,  0, . . . ,  0).  In  computing  q3  we  again  increase 
q j.  This  is  legal,  since  Li(q2)  <  0  by  construction.  We  repeat  this  process,  until  q\ 
has  been  increased  to  its  maximum  value,  at  which  point  we  move  on  to  some  other 
qi.  The  whole  process  is  repeated  several  times,  until  all  the  q,  have  been  changed 
from  zero  to  their  maximum  values. 

The  key  observation  in  the  process  described  above,  is  that  we  always  modify  only 
one  qi  at  a  time,  namely  one  for  which  Li  <  0.  In  particular,  if  L,  =  0,  we  modify 
qx  completely,  and  thereafter  fo-get  about  it.  If  it  is  merely  true  that  Li  <  0,  then 
we  are  careful  to  modify  q,  only  so  far  until  one  of  the  other  {L3}  that  are  still  under 
consideration  becomes  zero.  And  so  forth. 

We  can  now  finally  prove  the  main  claim,  namely  Claim  3.7. 

Proof  of  Claim  3.7.  Recall  that  v  is  the  maximum  expected  velocity  of  the 
Markov  chain  relative  to  the  labelling  {f,},  and  that  this  expected  velocity  is  strictly 
negative.  By  Lemma  3.10,  one  can  modify  the  transition  probabilities  so  that  the 
expected  velocity  at  each  state  is  exactly  equal  to  v.  Furthermore,  for  a  given  state 
sx,  the  only  changes  to  the  transition  probabilities  entail  increasing  pxx  and  decreasing 
pxy  for  values  of  y  that  satisfy  <  lT.  By  Corollary  3.9  the  expected  success  times  of 
the  resulting  chain  are  precisely  proportional  to  the  state  labels,  with  proportionality 
constant  —  1/v. 

Now  imagine  reversing  the  modifications,  so  that  one  gets  back  the  original  chan. 
This  is  just  the  process  described  by  Lemma  3.11  and  subsequent  comments.  These 
observations  therefore  establish  that  the  expected  success  times  of  the  original  chain 
are  no  greater  than  the  success  times  of  the  modified  chain.  In  short,  Dt  <  —  f,/v,  as 
claimed.  1 
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3.6  Progress  in  Tasks  with  Non-Deterministic 
Actions 

The  claims  of  the  previous  section  apply  to  actions  in  which  the  effects  at  each  step 
and  in  each  state  are  probabilistically  determined.  However,  as  we  mentioned  at  the 
outset,  for  many  tasks  the  actions  are  merely  non-deterministic,  that  is,  no  probability 
distributions  axe  given.  In  this  case,  the  claims  do  not  apply.  In  particular,  the 
definition  of  an  average  velocity  no  longer  makes  sense.  Of  course,  one  can  define  a 
worst-case  velocity,  which  measures  the  least  amount  of  progress  possible  in  any  given 
state,  and  then  variants  of  some  of  the  claims  will  go  through.  This  does  not  seem 
very  satisfying,  for  two  reasons.  First,  insisting  that  the  worst-case  velocity  point 
towards  the  goal  is  not  much  different  from  insisting  on  deterministic  actions.  And 
second,  often  it  is  simply  not  possible  to  ensure  that  progress  is  made  towards  the 
goal.  This  is  particularly  true  in  the  imperfect  sensing  case. 

Surprisingly  enough,  however,  the  condition  that  the  worst-case  velocities  point 
towards  the  goal  characterizes  the  tasks  for  which  solutions  exist,  at  least  when  sensing 
is  perfect.  This  section  therefore  presents  a  brief  exposition  of  this  condition. 

First,  we  have  the  following  claim. 

Claim  3.12  Let  (S,  A,  E,  Q)  be  a  discrete  planning  problem,  where  S  is  the  set  of 
states,  A  is  the  set  of  actions,  E  is  the  sensing  function,  and  Q  is  the  set  of  goal 
states.  Assume  that  E  is  the  perfect-sensing  function  (this  will  be  relaxed  later). 

Suppose  that  there  exists  a  guaranteed  strategy  for  moving  from  any  state  to  the 
goal  set  Q.  Then  there  exists  a  sequence  of  disjoint  sets  50,5j, . . .  ,S(  that  cover  S , 
such  that  states  in  the  set  S{+\  can  traverse  to  states  in  the  union  P,  =  U;=o  in  a 
single  step.  Furthermore,  S0  =  Q ,  and  I  <  r  =  |5|  —  \Q\. 

It  follows  that  there  is  a  perfect-sensing  strategy  that  moves  through  the  tower  of 
sets  S  —  T>t  D  ■  •  •  D  D\  D  Po  =  Q  in  decreasing  order  until  the  goal  is  attained. 
Furthermore,  the  strategy  moves  down  at  least  one  level  in  this  tower  on  each  step  of 
execution. 

[A  definitional  note:  To  say  that  a  strategy  is  in  one  of  the  levels  V,  at  a  particular 
time,  means  that  at  that  time  one  of  the  possible  states  of  the  system  is  in  the  set 
S,  and  no  state  is  in  a  set  with  j  >  i.  To  say  that  a  strategy  moves  between  two 
levels  means  that  there  is  an  execution  trace  for  which  the  strategy  first  finds  itself 
in  one  level,  then  one  step  later  in  the  next  level.] 

Proof.  The  proof  is  based  on  the  construction  used  in  the  proof  of  claim  3.4,  which 
we  repeat  here.  Define  So  to  be  Q ,  and  then  inductively  define  <Si+i  to  be  the  set  of 
all  states  s  in  S  —  U;=o^  f°r  which  there  exists  a  single-step  action  that  causes  s 
to  traverse  to  some  state  in  the  set  Uj=o^r  The  sets  5,  are  well-defined,  and  by 
construction  they  are  disjoint.  We  need  to  show  that  they  cover  S  and  that  there  are 
no  more  than  r  of  them.  Note  that  the  set  Uj=o^j 's  just  the  set  of  all  states  that  can 
be  guaranteed  to  reach  a  goal  state  in  i  or  fewer  steps.  The  existence  of  a  guaranteed 


3.6.  PROGRESS  IN  TASKS  WITH  NON-DETERMINISTIC  ACTIONS 


141 


strategy  means  that  all  states  can  be  guaranteed  to  reach  the  goal  in  a  finite  number 
of  steps,  so  the  {5;}  cover  S.  Finally,  by  claim  3.5,  no  more  than  r  steps  are  ever 
required. 

The  second  part  of  the  claim  follows  immediately.  | 

This  claim  is  very  similar  to  the  proofs  of  claims  3.4  and  3.5.  The  difference  is  that 
the  current  claim  emphasizes  the  overall  structure  of  the  perfect-sensing  strategy.  Not 
only  do  individual  execution-time  traces  of  the  perfect-sensing  strategy  never  need 
to  revisit  a  state,  but  seen  as  a  whole,  the  strategy  should  permanently  prune  away 
possible  states  on  each  step.  Intuitively  this  makes  sense.  After  all,  if  there  is  some 
state  that  does  not  get  pruned  away,  then  it  is  possible  to  repeatedly  encounter  that 
state,  which  means  that  the  strategy  cannot  be  guaranteed  to  converge  to  the  goal. 

Notice  also  that  the  fact  that  the  sensing  function  is  perfect  is  really  not  used  in 
the  proof,  except  to  limit  the  number  of  sets  that  are  required.  This  should  not  be 
surprising,  given  the  equivalence  between  the  existence  of  a  perfect-sensing  strategy 
and  goal  reachability  in  general,  as  established  by  claim  3.4.  This  then  leads  us  to 
believe  that  the  claim  holds  for  an  arbitrary  sensing  function,  and  indeed  it  does,  with 
precisely  the  same  sets  {£,}.  However,  whereas  these  sets  actually  define  a  strategy 
in  the  perfect-sensing  case,  they  only  hint  at  one  in  the  general  case.  Specifically,  one 
has  the  following  corollary,  which  despite  the  length  of  statement,  is  actually  quite 
weak. 

Corollary  3.13  Consider  a  system  as  in  the  previous  claim,  hut  with  an  arbitrary 
sensing  function  E.  Construct  the  sets  {Dt},  as  in  the  proof  of  the  claim.  Suppose 
there  is  a  strategy  that  is  guaranteed  for  each  possible  starting  state  to  attain  the  goal 
set  Q . 

Then  this  strategy  must  necessarily  move  through  the  sets  S  =  V(  D  ■  •  ■  D  T>i  D 
V0  —  Q  in  decreasing  order  until  the  goal  is  attained.  However,  the  strategy  can  spend 
several  steps  of  execution  within  one  level  of  this  tower  before  proceeding  down  to  a 
lower  level,  and  can  even  move  back  up  levels. 

Proof.  If  the  strategy  is  in  level  D,,  then  there  is  at  least  one  possible  state  of  the 
system  that  may  require  i  steps  to  reach  the  goal.  This  says  that  there  is  a  possible 
execution  trace  for  which  the  system  must  first  pass  through  an  immediately  lower 
level  before  reaching  the  goal.  | 

In  order  to  summarize,  suppose  we  have  a  non-negative  labelling  of  the  states  {£,} 
that  is  zero  at  the  goal.  Now  define  for  any  state  s  and  any  action  A ,  the  worst-case 
velocity  to  be 


VA., 


max  {lt  -  £,), 


where  F^(s)  is  the  the  set  of  states  that  the  non-deterministic  action  A  might  transit 
to  from  state  s. 


142 


CHAPTER  3.  RANDOMIZATION  IN  DISCRETE  SPACES 


In  a  sense,  this  velocity  measures  the  minimum  approach  to  the  goal,  relative  to 
the  labelling.  If  v is  negative,  then  no  matter  which  non-deterministic  transition 
is  actually  followed,  the  system  is  making  progress. 

For  the  perfect-sensing  case,  we  see  that  one  can  label  all  states  in  the  set  <S, 
with  the  number  i.  Then  the  worst-case  velocity  at  each  point  is  —1,  and  the  system 
will  reach  the  goal  in  no  more  than  t  =  —Itk/v,k<Ak  steps,  for  each  state  Sk  and  its 
perfect-sensing  action  /l*.  In  short  the  formalism  carries  through,  trivially,  in  the 
perfect-sensing  case. 

In  the  more  general  case,  it  is  not  clear  whether  it  really  makes  sense  to  define 
the  worst-case  velocity  at  a  state.  For  one  thing,  the  action  executed  in  a  state  is 
not  well-defined,  since  a  state  may  be  revisited  several  times  during  the  execution  of 
a  strategy,  but  the  action  executed  will  generally  depend  on  the  system’s  knowledge 
state,  which  will  be  different.  Hand  in  hand  with  this  is  the  lack  of  substance  provided 
by  a  progress  measure  on  the  state  space,  as  indicated  by  corollary  3.13.  However, 
if  one  not  only  maximized  over  all  possible  target  states  in  the  definition,  but  also 
over  all  possible  actions  that  a  strategy  might  execute  in  a  given  state,  that  is,  if  one 
defined  the  worst-case  velocity  to  be 

(3.20)  v,=  max  max(£t-£,), 

applicable 
actions  A 

then  the  linear  convergence  result  basically  goes  through  as  before.  This  result  is 
probably  too  weak  to  be  useful. 

One  issue  may  be  bothersome.  In  the  probabilistic  setting,  this  strong  distinction 
regarding  the  use  of  a  progress  measure  with  perfect  sensing  versus  the  use  of  a 
progress  measure  with  imperfect  sensing  did  not  seem  to  arise.  In  fact,  it  does  arise, 
but  this  was  not  relevant  to  the  discussion  on  Markov  chains.  In  the  discussion  of 
progress  measures  on  Markov  chains  we  were  tacitly  assuming  that  the  effect  of  an 
action  and  the  interpretation  of  a  sensor  in  deciding  on  an  action  depended  only  on 
the  current  state  of  the  system.  This  makes  sense  in  the  perfect-sensing  case.  It  also 
makes  sense  if  actions  are  selected  directly  as  a  function  of  sensor  values,  and  not  as 
a  function  of  the  interpretation  of  those  sensor  values  in  terms  of  past  information. 
In  general,  however,  the  interpretation  of  a  sensor  depends  on  the  knowledge  state 
of  the  executive  system,  not  just  on  the  actual  state  of  the  system  (see  sections  3.2.5 
and  3.2.6).  Once  one  re-introduces  this  dependence,  then  the  distinction  between 
perfect  and  poor  sensing  becomes  important,  both  in  the  non-deterministic  and  the 
probabilistic  settings. 


3.7  Imperfect  Sensing 

In  general,  given  an  imperfect  sensor,  the  appropriate  states  of  the  system  are  the 
knowledge  states.  In  the  non-deterministic  setting  these  are  all  the  subsets  of  the 
underlying  state  space,  while  in  the  probabilistic  setting  these  are  all  the  probability 
distributions  over  the  underlying  state  space.  One  can  then  define  labellings  and 
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progress  measures  as  before  on  this  space  of  knowledge  states,  and  all  the  results  will 
go  through.  Formally,  this  is  the  correct  description  of  the  problem.  However,  as  we 
have  already  indicated,  the  general  planning  problem  is  hard,  which  is  reflected  in 
the  exponential  size  of  the  space  of  knowledge  states.  For  this  reason  one  seeks  less 
complete  approaches  that  nonetheless  can  handle  a  variety  of  tasks.  This  is  precisely 
the  reason  that  we  decided  to  consider  the  special  case  settings  of  random  walks  and 
Markov  chains  in  the  first  place. 


3.8  Planning  with  General  Knowledge  States 

In  order  to  deal  with  the  general  sensing  case,  it  is  useful  to  consider  a  planner 
for  determining  guaranteed  strategies  for  achieving  the  goal.  A  guaranteed  strategy 
in  this  context  means  a  bounded  number  of  actions  and  sensory  operations  that  are 
certain  to  attain  a  goal  state  from  the  specified  initial  states,  under  the  specified  model 
of  uncertainty.  In  general  the  actions  will  be  functions  of  sensors,  that  is,  the  strategy 
will  involve  conditional  choices  based  on  non-deterministic  events  whose  outcomes 
cannot  be  predicted  at  planning  time.  Nonetheless,  the  flow  graph  of  these  choices 
and  non-deterministic  events  can  be  written  out,  and  it  has  finite  size  and  converges 
to  the  goal.  The  planning  approach  described  in  this  section  is  a  specialization  of  the 
preimage  planning  approach  defined  by  [LMT].  It  may  also  be  thought  of  as  dynamic 
programming  with  a  boolean  cost  function  (see  [Bert]).  Before  reading  this  section, 
it  may  be  worthwhile  to  reread  the  example  of  section  3.2.4. 

The  basic  idea  is  to  apply  backchaining  in  a  state-space  whose  states  are  the 
knowledge  states  of  the  executive  system.  By  construction,  “sensing”  in  such  a  space 
is  perfect.  This  means  that  by  definition  the  system  knows  exactly  which  (knowledge) 
state  it  is  in  at  any  point  during  execution.  The  approach  is  applicable  to  both  the 
non-deterministic  and  the  probabilistic  settings.  Let  us  just  briefly  outline  how  the 
planner  might  proceed  for  the  non-deterministic  setting.  If  S  is  the  underlying  state 
space,  then  the  planner’s  state  space  is  the  set  of  all  knowledge  states,  that  is,  the 
space  2s.  Let  us  assume  that  an  action  is  always  followed  by  some  kind  of  sensing 
operation  (possibly  a  no-op).  If  A  is  an  action  in  the  underlying  state-space,  and  K\ 
is  a  knowledge  state,  then  the  result  of  applying  action  A  is  a  new  knowledge  state 
Ki  (see  section  3.2.5  for  the  details  of  how  to  construct  Kj)-  Now  suppose  that  a 
finite  number  of  sensory  interpretation  sets  can  be  returned  by  the  sensor  after  the 
action  has  been  executed.  The  actual  sensory  interpretation  set  will  in  general  depend 
on  the  actual  state  of  the  system,  plus  possibly  several  other  parameters.  Let  this 
collection  be  {/j,  /2,  •  •  • ,  //}  =  U.eKj  ^(s)i  and  define  A'j  by  K\  =  fl  U-  Then  we 
can  write  the  non-deterministic  effect  of  the  combination  of  action  A  and  sensing  in 
the  space  2^  as 


A  :  K\ 
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This  means  that  at  execution  time  the  action  A  (which  corresponds  to  performing 
action  A  followed  by  some  sensory  operation)  will  transit  non-deterministically  from 
(knowledge)  state  K\  to  one  of  the  states  K'2.  By  construction,  our  sensing  guarantees 
that  the  execution  system  will  know  precisely  which  knowledge  state  has  been 
attained.  Thus  the  problem  is  a  perfect-sensing  problem. 

Since  the  problem  has  a  perfect-sensing  function,  one  can  apply  the  techniques 
previously  discussed  for  such  problems.  In  particular,  one  can  plan  strategies  for 
achieving  a  goal  state  (and  knowing  that  it  has  been  achieved)  by  backchaining 
from  the  goal  in  the  space  2s.  This  amounts  to  applying  the  dynamic  programming 
discussed  in  section  3.2.4.  Backchaining  entails  first  determining  the  collection  K.x 
of  all  knowledge  states  that  can  attain  a  goal  state  with  the  execution  of  a  single 
action-sense  pair,  then  determining  the  collection  fC2  of  all  knowledge  states  that  can 
attain  one  of  the  knowledge  states  in  the  collection  K.\  using  a  single  action-sense 
pair,  and  so  forth.  This  construction  is  identical  to  the  construction  of  the  sets  {«S{} 
in  claim  3.12,  but  now  these  sets  reside  in  the  space  2s.  The  method  of  transforming 
an  imperfect-sensing  problem  into  a  perfect-sensing  problem  by  moving  to  the  space 
of  knowledge  states  is  a  standard  technique  (see,  for  instance,  [Bert]).  As  we  see, 
this  transformation  combines  an  action  and  a  sensory  operation  in  the  underlying 
state  space  into  a  perfect-sensing  operation  in  the  space  of  knowledge  states.  An 
alternate  approach  is  to  model  the  effect  of  actions  and  sensing  operations  as  defining 
an  A.ND/Or  graph  in  the  knowledge  space  (see  [Buc]  or  [TMG]  for  details). 

There  are  two  basic  possible  formulations  of  the  planning  problem.  One  is  to  seek 
a  sequence  of  motions  that  is  guaranteed  to  move  the  initial  knowledge  state  T  to  the 
goal  state  Q.  [The  notation  may  be  confusing,  since  so  far  we  have  thought  of  Q  as 
being  a  set  of  goal  states,  but  in  the  space  of  knowledge  states  this  is  just  one  state. 
More  generally,  one  could  have  several  such  sets  {(/,},  and  the  formalism  would  go 
through.]  A  second  approach  is  to  limit  a  priori  the  number  of  steps  considered.  In 
this  case,  one  backchains  for  the  specified  number  of  steps,  whereas  in  the  previous 
case  one  backchains  until  all  knowledge  states  have  been  considered.  In  both  cases,  of 
course,  one  can  stop  if  at  any  step  the  backchaining  process  generates  the  knowledge 
state  I. 

An  example  should  clarify  all  this  notation: 

Suppose  there  are  three  states,  sj,  s2,  and  so,  where  sc  is  the  goal  state.  Suppose 
that  our  sensor  is  good  enough  to  tell  us  whether  we  are  in  the  goal  or  not,  but  cannot 
distinguish  between  states  Si  and  s2.  Finally  let  there  be  two  actions,  Ax  and  A2, 
specified  by: 


•Si 

h- ► 

A2  :  sx 

h-4 

si 

•S2 

> — ► 

S2 

S2 

1 — ► 

SG 

sg 

t — ► 

5Gi 

SG 

\ — ► 

SG- 

These  actions  are  depicted  in  figure  3.14. 

The  space  of  knowledge  states  is  given  by  the  seven  sets  (we  exclude  the  empty 
set,  which  implies  inconsistent  knowledge): 


k 
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Figure  3.14:  Some  simple  non-deterministic  actions  on  a  discrete  state  graph. 


{•Si,  52,  sc},  {st,  S2},  {si,  sg}»  {52,  5g},  {si},  {52},  {•Sc}- 
For  instance,  the  knowledge  state 

{51,52} 

means  that  at  execution  time  the  system  knows  that  it  is  either  in  state  5]  or  state  s2, 
but  it  does  not  know  which  one.  If  the  system  always  performs  a  sensory  operation 
after  each  action,  then,  since  the  sensor  can  recognize  the  goal  state  for  sure,  one  may 
actually  eliminate  from  the  space  of  knowledge  states  all  states  that  contain  both  the 
goal  state  and  some  other  state.  Thus  the  relevant  planning  space  is  given  by  the 
four  knowledge  states 


{Si,S2},{5,},{s2},{sg}, 

with  Q  —  {sc}  as  goal  state. 

Let  us  compute  the  actions  induced  in  the  knowledge  space  by  an  action-sense 
pair.  For  action  Ax  we  have: 


^1  :  {51,52}  {52},  {5g}  Ai  : 

{51,52}  *— * 

{5i},{5G} 

{51}  {52},  {5g}  while  Ai  becomes: 

{-l} 

{^l} 

{52}  {52} 

{s2}  •— 

{*c} 

{ 5G }  ~  {5G}, 

{5g}  ►- 

M- 

See  figure  3.15  for  a  graphical  display. 
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Figure  3.16:  This  figure  shows  how  a  backchaining  strategy  might  evolve  in  the  space 
of  knowledge  states.  See  also  figure  3.15. 


Here  is  the  dynamic  programming  table  for  this  problem.  The  horizontal  axis 
of  the  table  indicates  the  number  of  steps  remaining,  the  vertical  axis  indicate^  the 
current  knowledge  state.  Each  entry  indicates  the  action  to  take  in  order  to  attain  the 
goal,  given  the  number  of  steps  remaining  and  the  current  knowledge  s'  ate.  The  table 
is  constructed  by  first  backchaining  from  the  goal,  in  order  to  construct  all  entries 
in  the  column  for  one  remaining  step,  then  backchaining  from  that  column,  and  so 
forth.  A  blank  entry  indicates  that  it  is  not  possible  to  successfully  and  recognizabh 
attain  the  goal  in  the  number  of  steps  specified  from  that  knowledge  state. 


Steps  Remaining 

2  1  0 

At 

{•Sl,  s2} 

Ai 

{•si} 

Knowledge  States 

a2  a2 

{*2  } 

stop  stop  stop 

isc} 

Actions  guaranteed  to  attain  the  goal. 

As  the  table  indicates,  it  is  possible  to  attain  the  goal  from  any  non-goal  initial  state 
of  knowledge  in  at  most  two  steps. 

Let  us  relate  this  notation  to  the  preimage  planning  methodology  developed  by 
[LMT]  (see  also  chapter  4).  The  entry  in  column  1  says  that  the  preimage  of  the  goal 
under  action  A2  is  the  set  {s2}.  This  means  that  if  the  system  knows  that  it  is  in 
stat°  s2,  then  executing  action  A2  followed  by  a  sensing  operation  is  guaranteed  to 
attain  the  goad.  This  is  written  as  P fl({sG})  =  for  R  =  {s2}.  Similarly,  the 
top  entry  of  column  2  comes  from  the  fact  that  the  set  {si,  s2}  is  the  preimage  under 
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action  A\  of  the  two  sub-goals  {s2}  and  {sc?}.  In  the  preimage  methodology  this  is 
written  as  H({s2},  {5g})  =  A.  for  R  =  {si,s2}.  This  means  that  if  the  system 
starts  out  knowing  only  that  it  is  in  either  state  Si  or  state  s2,  then  after  action  A2 
and  a  sensing  operation,  the  system  will  have  traversed  to  either  state  s2  or  the  goal 
sg,  and  the  system  will  know  which  state  it  has  attained.  Figure  3.16  depicts  this 
graphically. 


3.9  Randomization  with 

Non-Deterministic  Actions 

As  we  have  formulated  the  problem  thus  far,  the  planner  constructs  a  circuit  of 
knowledge  states  by  backchaining  from  the  goal.  The  problem  is  considered  solved  if 
one  of  these  knowledge  states  contains  the  initial  state  of  the  system.  This  is  what 
is  meant  by  a  guaranteed  solution  throughout  this  thesis.  For  some  tasks  however, 
there  is  no  such  guaranteed  solution.  We  mentioned  this  in  the  introduction.  We  will 
quickly  review  the  example  of  figure  1.17  on  page  42  from  the  introduction.  Assume 
that  the  goal  is  recognizable,  that  is,  that  the  sensing  function  for  this  problem  can 
detect  entry  into  the  goal  state.  If  the  initial  state  of  the  system  is  known  exactly, 
then  there  is  a  simple  solution  for  attaining  the  goal.  Specifically,  if  the  system  is  in 
state  si,  then  execute  action  A\,  whereas  if  the  system  is  in  state  s2,  then  execute 
action  .42.  On  the  other  hand,  if  the  initial  knowledge  state  is  the  set  {si,s2}  then 
there  is  no  sequence  of  actions  guaranteed  to  attain  the  goal.  Fortunately,  there  exists 
a  randomized  solution  that  is  ex  'ected  to  attain  the  goal  very  quickly.  This  solution 
consists  of  guessing  the  state  of  the  system,  then  executing  the  action  appropriate  for 
that  state.  In  this  simple  example,  there  are  two  possible  choices  for  the  starting  state. 
Thus,  with  probability  1/2  the  randomized  strategy  will  guess  the  correct  starting 
state.  It  follows  that  the  expected  time  until  goal  attainment  is  two  attempts. 

This  same  approach  of  randomizing  the  initial  state  may  of  course  be  applied  even 
if  there  exists  a  guaranteed  solution.  The  motivation  would  be  to  find  a  randomized 
solution  that  requires  fewer  steps  on  average  than  the  guaranteed  solution. 


3.9.1  Guessing  the  Starting  State 

Let  us  specify  formally  the  relationship  of  randomization  by  guessing  with  the 
guaranteed  planning  approach.  As  usual,  we  will  view  the  planning  process  in  terms 
of  backchaining,  and  specifically,  in  terms  of  dynamic  programming  in  the  space 
of  knowledge  states.  Consider  the  ;olumn  in  the  dynamic  programming  table  that 
corresponds  to  i  steps  remaining  in  the  strategy.  Consider  all  the  knowledge  states 
/\li2,  •  •• ,  K,,t, }  in  this  column  that  have  non-blank  entries.  These  are  all  the 
knowledge  states  for  which  there  exists  a  sequence  of  at  most  i  actions  guaranteed  to 
attain  the  goal.  This  collection  is  precisely  the  set  X\,  in  the  notation  of  claim  3.12. 
Suppose  that  Z0  is  the  initial  knowledge  state  of  the  system.  If  J0  =  A'lv7  for  some  j, 
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then  there  is  a  guaranteed  strategy  consisting  of  no  more  than  t  steps  that  will  attain 
the  goal.  More  generally,  however,  we  may  have  that 


2o  C  (J  Ki<}. 

}=■  i 

In  that  case  there  exists  a  randomized  strategy  that  consists  of  guessing  an  effective 
knowledge  state.  To  see  this,  suppose  that  I0  is  of  the  form  {sl5 s2>  •■■,«,}.  In  other 
words,  there  are  q  (with  q  <  n  =  |5|)  possible  starting  states  of  the  system.  Thus 
there  exist  q  or  fewer  knowledge  states  in  the  collection  Z>,  which  cover  T0-  We  may 
thus  assume  that 


lo  C  (J  Ki,3 

j= i 

A  randomized  strategy  consists  of  guessing  between  these  q  knowledge  states,  then 
executing  the  proper  sequence  of  actions  designed  to  attain  the  goal.  For  instance, 
if  the  system  were  to  guess  that  is  a  knowledge  state  that  contains  the  actual 
starting  state  of  the  system,  then  henceforth  the  system  would  execute  all  actions 
and  sensing  operations  as  if  the  initial  knowledge  state  really  had  been  A\ ;  instead 
of  10.  Since  the  states  {  A'(1,  A'1i2,  ■  •  • ,  A',,,}  cover  J0,  the  guess  will  be  correct  with 
probability  no  less  than  l/q.  Thus  with  probability  at  least  1/q  the  goal  will  be 
successfully  attained  with  a  strategy  requiring  i  or  fewer  steps. 

A  simple  example  of  this  state-guessing  approach  is  given  by  the  two-dimensional 
peg-in-hole  problem  of  figure  2.2.  If  the  resolution  of  the  sensor  is  not  good  enough 
to  determine  whether  the  peg  is  to  the  left  or  to  the  right  of  the  hole,  then  a  useful 
strategy  is  simply  to  guess  the  side  on  which  the  peg  is  located.  Depending  on  the 
outcome  of  the  guess,  the  system  then  moves  either  right  or  left.  If  the  guess  is  correct, 
the  the  peg  winds  up  in  the  hole.  If  the  guess  is  incorrect,  then  the  system  can  either 
guess  again  or  use  the  failure  as  information  to  select  the  appropriate  direction  of 
motion. 

A  more  complex  example  will  be  given  in  the  continuous  domain  in  chapter  4.  See 
in  particular  figure  4.8. 


3.9.2  Execution  Traces 

In  order  to  gain  some  intuition  as  to  the  types  of  execution  traces  that  might  occur, 
let  us  consider  a  randomized  system  at  execution  time.  The  system  first  guesses  its 
starting  knowledge  state,  then  executes  some  appropriate  strategy.  This  strategy  is 
a  guaranteed  strategy  for  attaining  the  goal,  in  the  sense  that  the  strategy  would 
reliably  and  recognizably  attain  the  goal  if  the  system  knew  for  certain  its  starting 
knowledge  state.  However,  since  the  starting  knowledge  state  is  merely  guessed,  it  is 
possible  that  the  system  may  encounter  an  inconsistency  at  execution  time,  reflected 
by  the  empty  knowledge  state.  We  assume  that  the  system  ceases  execution  of  its 
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current  guessed  strategy  should  it  ever  encounter  the  empty  set  as  a  knowledge  state 
at  run-time. 

In  general,  the  system  might  actually  be  able  to  decide  that  it  has  attained  the 
goal,  even  though  an  inconsistency  has  occurred  (see  claim  3.14  below).  This  decision 
involves  an  additional  test,  that  essentially  takes  into  account  the  effect  of  all  the  past 
actions  and  sensory  interpretations  on  the  entire  range  of  possible  starting  states,  not 
just  on  those  in  the  guessed  knowledge  state. 

There  are  thus  two  different  notions  of  failure  to  recognizably  attain  the  goal. 
One  notion  refers  to  failure  relative  to  the  guessed  starting  region.  This  failure  is 
evidenced  either  by  the  occurrence  of  an  inconsistency  or  by  the  presence  of  non-goal 
states  in  the  knowledge  state  derived  from  the  guessed  starting  knowledge  state.  The 
other  notion  refers  to  failure  of  the  more  accurate  test,  which  takes  into  account  all 
possible  starting  states.  No  inconsistencies  can  occur  here,  and  thus  this  failure  is 
evidenced  merely  by  the  presence  of  non-goal  states  in  the  knowledge  state  derived 
from  the  initial  starting  region  I0.  Either  notion  of  failure  is  reasonable,  depending 
on  whether  the  more  accurate  test  is  implemented. 

Suppose  that  a  failure,  by  either  definition,  does  occur.  Then,  under  suitable 
conditions,  the  system  may  guess  a  new  starting  knowledge  state,  execute  a  new 
strategy  for  the  jwly  guessed  knowledge  state,  and  so  forth,  repeatedly,  until  the 
goal  is  finally  attained.  We  will  elaborate  on  these  conditions  shortly. 

In  short,  there  are  a  couple  of  subtleties  that  need  to  be  addressed.  The  first 
issue  deals  with  the  behavior  of  the  system  if  it  guesses  the  wrong  starting  state.  The 
second  issue  deals  with  repeated  guessing. 

3.9.3  Incorrect  Guessing 

First,  consider  the  behavior  of  the  system  if  it  guesses  the  wrong  starting  state.  There 
are  four  possible  results:  (1)  The  system  completes  execution  without  thinking  that 
it  has  attained  the  goal  (although  it  may  have),  (2)  the  system  thinks  that  is  has 
attained  the  goal  when  indeed  it  has,  (3)  the  system  thinks  that  it  has  attained  the 
goal  when  in  fact  it  has  not,  and  (4)  the  system  encounters  an  inconsistency  during 
execution. 

The  first  two  of  these  scenarios  are  standard  and  do  not  require  elaboration.  As 
an  aside,  let  us  note  that  scenario  number  one  does  not  actually  occur  in  the  current 
context.  This  is  because  the  system  is  executing  a  strategy  that  is  guaranteed  to 
attain  the  goal  state  from  the  (incorrectly)  guessed  starting  state.  Thus,  either  an 
inconsistency  must  occur  during  execution,  or  the  system  must  eventually  believe 
that  it  has  attained  the  goal. 

In  order  to  understand  the  other  two  possibilities,  imagine  the  behavior  of  the 
system  if  it  guesses  a  knowledge  state  Kx]  that  does  not  contain  the  actual  initial 
state  of  the  system.  At  each  step,  the  system  will  perform  some  action  and  some 
sensing  operation  as  specified  by  the  dynamic  programming  table.  This  action  and  the 
returned  sensed  value  are  used  to  update  the  knowledge  state,  in  the  manner  described 
in  section  3.2.5.  However,  since  the  knowledge  state  at  each  step  of  execution  may 
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not  contain  the  actual  state  of  the  system,  the  resulting  sensory  interpretation  sets 
may  not  be  consistent  with  the  predicted  forward  projection  of  the  knowledge  state. 
In  other  words,  the  set  Kj  =  Fa(Ki)P\I  may  be  empty,  where  K\  is  some  knowledge 
state,  Fa(Ki)  is  the  forward  projection  of  under  some  action  A ,  and  I  is  the  result 
of  some  sensing  operation.  One  sees  then  that  under  this  randomized  strategy,  the 
empty  set  can  appear  as  a  knowledge  state.  If  ever  it  does  appear,  then  the  system 
knows  that  it  has  guessed  incorrectly,  and  that  it  should  stop  execution.  In  fact, 
inconsistencies  can  arise  more  generally,  if  the  full  sensing  consistency  requirement 
is  not  satisfied.  At  any  sensing  time,  the  system  knows  what  the  possible  sensor 
values  are  that  it  should  be  able  to  see.  If  a  different  sensor  value  actually  appears, 
then  an  inconsistency  has  occurred,  and  the  system  knows  that  it  originally  guessed 
incorrectly.  Said  differently,  the  set  Fa(Ki)C\'  I  is  empty  (recall  the  meaning  of  ff 
from  section  3.2.5).  This  explains  scenario  number  four. 

Scenario  number  three  can  occur  precisely  when  no  inconsistencies  appear,  despite 
the  initial  guess  having  been  wrong.  In  other  words,  the  execution  trace  of  knowledge 
states  from  some  initial  knowledge  state  to  the  goal  Q  proceeds  successfully, 
despite  the  system  not  being  in  }  initially.  In  some  cases  the  system  may  wind  up 
in  Q  serendipitously,  but  this  need  not  be  guaranteed.  An  example  is  given  in  figure 
3.17.  In  this  example  there  are  two  possible  starting  positions.  The  action  executed 
is  to  move  straight  down,  until  a  collision  with  a  horizontal  edge  is  detected.  There 
are  two  such  edges,  one  of  which  is  the  goal.  If  the  system  guesses  that  it  has  started 
at  the  point  p2  (which  lies  above  the  goal  edge),  but  is  really  at  location  pi,  then  the 
knowledge  state  at  the  end  of  th  motion  will  incorrectly  indicate  goal  attainment. 

See  also  [Don89]  for  further  dt.ails  on  the  implications  of  “lying”  to  a  system  at 
run-time  by  specifying  the  wrong  start  location.  Donald  has  used  this  technique  in  his 
work  on  Error  Detection  and  Recovery  to  suggest  multi-step  strategies  for  trying  to 
attain  some  goal,  in  such  a  manner  that  the  process  winds  up  distinguishing  between 
those  start  locations  that  are  guaranteed  to  attain  the  goal  and  those  that  merely 
might  attain  the  goal.  Clearly  the  process  of  guessing  the  start  region  has  strong 
connections  to  his  approach,  as  will  become  apparent  in  this  section. 


3.9.4  Goal  Recognizability 

Of  the  four  scenarios,  the  only  troublesome  one  is  this  third,  the  problem  of  false 
goal  recognition.  The  resolution  of  this  problem  requires  an  applicability  condition. 
Essentially  the  idea  is  to  eliminate  all  possible  execution  traces  that  could  lead  to 
confusing  goal  interpretations.  Specifically,  for  any  execution  trace  that  does  indicate 
goal  attainment,  we  want  to  ensure  that  the  same  execution  trace  applied  to  other 
possible  initial  knowledge  states  either  also  indicates  goal  attainment  or  leads  to 
an  inconsistency.  We  will  state  this  condition  formally,  then  simply  enforce  it  by 
assuming  that  the  goal  is  recognizable  independent  of  any  history,  that  is,  any 
particular  execution  trace. 

In  order  to  state  the  condition  formally  let  us  introduce  some  temporary  notation. 
This  discussion  applies  to  the  non -deterministic  setting,  but  not  necessarily  to  the 


152 


CHAPTER  3.  RANDOMIZATION  IN  DISCRETE  SPACES 


Commanded  Velocity 

Pi 

P2 

• 

1 

1 

• 

1 

1 

1 

1 

1 

1 

1 

1 
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1 

rrm  Goal 

1 

1 

•  •  •  •  •  (detected  with  force  sensing) 

1 

fTTTJ 

Figure  3.17:  The  system  starts  in  one  of  the  two  indicated  locations,  moves  downward, 
and  detects  contact  with  a  horizontal  surface.  If  the  system  knows  that  it  started  at 
location  p2,  then  the  contact  signals  goal  attainment.  However,  if  the  system  merely 
guessed  that  it  started  at  pj,  then  the  force  sensor  may  falsely  signal  goal  attainment. 
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probabilistic  setting.  Given  a  starting  knowledge  state  K,  and  a  non-deterministic 
action  A  in  knowledge  space,  let  us  write  the  effect  during  execution  of  this  action  on 
K  as  K;A;1 ,  where  A  is  the  generating  non-deterministic  action  in  the  underlying 
state  space,  and  I  is  a  sensory  interpretation  set  that  is  returned  by  the  sensor 
at  execution  time.  In  other  words,  K2  =  K;  A;  /,  where  K2  is  the  knowledge 
state  determined  from  K  in  the  manner  of  section  3.2.5,  namely  as  FA(K)  f)' I,  the 
intersection  of  the  sensory  interpretation  set  with  the  forward  projection  of  the  start 
state.  More  generally,  given  a  sequence  of  actions  {Ai,  A2, . . . ,  A*}  and  an  associated 
sequence  of  run-time  sensory  interpretation  sets  {A ,  /2, . . . ,  /*},  the  effect  on  K  will  be 
denoted  by  K\  A\ ;  A ;  A2;  Ai  -  •  ■ ;  A*;  /*.  If  ever  a  sensory  interpretation  set  is  returned 
that  is  inconsistent  with  the  possible  sensory  interpretation  sets  expected  at  that 
point,  the  resulting  knowledgs  state  is  simply  the  empty  set  0.  For  consistency,  we 
therefore  define  0;  A;  /  =  0  for  any  action  A  and  any  sensory  interpretation  set  I. 

Suppose  now  that  the  system  guesses  that  the  initial  knowledge  state  is  the  set 
Kq.  The  strategy  for  attaining  the  goal  G  from  K0  is  encoded  in  the  dynamic 
programming  table.  Suppose  that  the  first  action,  Aj,  is  taken  from  the  entry  for 
A'o  in  the  kih  column  of  the  dynamic  programming  table.  Execution  of  A\  involves 
execution  of  some  action  A\  on  the  underlying  state  space,  followed  by  some  sensory 
observation  that  yields  a  sensory  interpretation  set  A-  Once  A}  has  been  executed, 
the  resulting  knowledge  state  determines  the  next  action  to  perform.  This  action 
A2  is  again  encoded  in  the  dynamic  programming  table.  Action  A2  in  turn  results 
in  some  new  run-time  sensory  interpretation  set  /2,  and  so  forth.  If  the  initial  state 
of  the  system  was  indeed  covered  by  the  starting  knowledge  state  K0,  then  after  k 
actions  the  resulting  knowledge  state  will  be  non-empty  and  inside  the  goal,  that 
is,  0  ^  A'o;Ai;/i;  A2;/2;  -.;Afc;/fc  C  Q.  The  precise  sequence  is  of  course  not 
determined  until  execution  time.  On  the  other  hand,  if  the  initial  state  of  the  system 
was  not  covered  by  A'0,  then  the  final  knowledge  state  may  or  may  not  be  empty,  and 
may  or  may  not  accurately  depict  whether  the  goal  has  been  attained,  as  explained 
above. 

Now  consider  the  effect  of  the  sequence  Ax;  A;  A2;  /2;  •  •  • ;  Ak;  Ik  on  knowledge 
states  other  than  the  assumed  starting  knowledge  state  K0.  In  particular,  consider 
{■*<};  Aii  Ai  •  •  •  i  A*;  A  for  all  singleton  knowledge  state  {s<}.  Suppose  that  for 
each  possible  starting  state  s,,  the  final  knowledge  state  {s,};  A] ;  A;  A2;  /2;  ■  •  • ;  At;  A 
is  either  the  emptyset  0  or  lies  inside  the  goal  Q.  Then  clearly  the  goal  must  have 
been  attained,  even  if  the  initial  guess  K0  was  wrong!  Conversely,  suppose  that  for 
some  state  S;,  the  final  knowledge  state  is  non-empty  and  includes  states  outside  of 
the  goal.  If  s,'  could  have  been  a  starting  state  of  the  system,  then  one  cannot  be  sure 
that  the  system  has  entered  the  goal.  This  establishes  the  following  claim. 

Claim  3.14  Consider  a  discrete  planning  problem  (S,A,Z,G)  for  which  the  full 
sensing  consistency  requirement  holds.  Suppose  the  initial  state  of  the  system  is 
known  to  lie  in  some  subset  T0  C  5.  Suppose  further  that  there  exists  a  guaranteed 
strategy  for  attaining  the  goal  in  k  steps  if  the  initial  state  were  actually  known  to 
be  in  the  set  K0,  with  Kq  C  I0.  Imagine  that  the  system  executes  this  strategy  as 
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if  the  initial  knowledge  state  were  indeed  Ko-  Let  the  execution  trace  be  given  by 
A\\ h;  A?',  /2;  •  ■  ■ ;  Ak‘,  h-  Then  the  system  is  guaranteed  to  have  attained  the  goal  if 
and  only  if  Jo ;  Ar,  /x;  A2;  J2;  •  •  • ;  Ak\  IkQQ- 

[Notice  that  K0;  Ax;  h;  A2;  /2;  •  •  • ;  A*;  h  may  be  the  empty  set,  if  the  initial  state  of 
the  system  is  not  in  Ko .  However,  the  knowledge  state  To;  A\\I\\  A2;  /2;  •  •  • ;  A*;  Ik 
must  be  non-empty,  since  the  system  is  known  to  have  started  in  the  set  Jo,  and  since 
sensing  is  at  least  partially  consistent.) 

Proof.  The  claim  follows  from  the  discussion  above,  and  the  fact  that 

U  (M;A;/)  =  /r;A;/f 

j  €K 

for  any  knowledge  state  K,  by  lemmas  3.1,  3.2,  and  3.3.  § 

As  an  aside,  notice  that  the  proof  of  the  claim  never  made  use  of  the  fact  that 
the  execution  trace  was  the  result  of  executing  a  strategy  guaranteed  to  move  Ko  to 
the  goal.  This  suggests  that  the  claim  holds  for  any  strategy,  and  indeed  it  does,  but 
this  is  not  of  use  in  this  context. 

Definition.  Let  us  define  the  phrase  the  strategy  is  assured  of  reliable  goal 
recognition  from  K0  to  mean  that  any  execution  trace  of  the  strategy,  which 
transforms  K0  into  a  non-empty  knowledge  state  within  the  goal,  actually  implies 
goal  attainment. 

With  the  same  hypotheses  as  the  claim  above,  one  obtains  the  following  corollary. 
The  corollary  is  merely  a  restatement  of  the  definition  of  reliable  goal  recognition. 

Corollary  3.15  Suppose  that  a  randomized  strategy  guesses  that  the  system  is  in  K0l 
and  plans  to  execute  the  guaranteed  strategy  for  K0,  even  though  the  actual  state  of 
the  system  may  be  in  1Q  —  K0.  The  strategy  is  assured  of  reliable  goal  recognition  from 
Ko  if  and  only  if  J0;  Ax',  I\\  A2;  /2;  •  •  •  ;  A*;  IkQQ  for  all  possible  execution  traces  that 
might  occur  for  which  0  ^  Ko;  Ax ;  /j ;  A2;  /2;  •  •  • ;  Ak;  Ik  C  Q. 

[Observe  that  the  collection  of  possible  execution  traces  is  the  union  over  all  possible 
starting  states  in  Jo  of  exe:ution  traces  that  might  occur  when  executing  the 
guaranteed  strategy  for  A'0,  not  just  the  possible  execution  traces  that  might  occur 
when  executing  the  guaranteed  strategy  for  Ko  knowing  that  the  initial  state  is  in 
K0.  However,  the  corollary  only  requires  consideration  of  those  execution  traces  that 
are  consistent  with  Ko-] 

The  condition  of  this  corollary  forms  the  applicability  condition  for  a  randomized 
strategy.  If  the  condition  is  satisfied  for  all  possible  knowledge  states  K,  that  might 
be  guessed,  then  false  goal  recognition  is  avoided. 

As  an  aside,  observe  that  if  one  does  implement  the  more  accurate  test  to 
determine  whether  J0;  Ax;  h;  A2;  /2;  •  •  • ;  Ak;  Ik  C  Q,  then  corollary  3.15  is  irrelevant. 
The  corollary  really  tells  us  the  conditions  under  which  a  local  test  relative  to  the 
guessed  starting  state  K0  is  sufficient  to  ensure  global  goal  attainment. 
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A  couple  of  additional  comments  are  in  order.  First,  a  quick  reading  of  the 
corollary  suggests  that  goal  recognition  is  only  reliable  if  the  entire  possible  starting 
region  To  is  guaranteed  to  attain  th^  goal.  If  that  were  indeed  true,  all  this  discussion 
would  be  absurd,  since  one  could  simply  apply  the  guaranteed  strategy  applicable  to 
Xo  rather  than  K0.  In  fact,  however,  the  corollary  merely  asserts  that  any  execution 
trace  starting  from  K0,  for  which  the  final  knowledge  state  derived  from  K0  is  non¬ 
empty  and  lies  inside  the  goal,  must  also  place  the  final  knowledge  state  derived 
from  To  inside  the  goal.  It  is  quite  possible  that  on  a  particular  execution  trace 
the  final  knowledge  state  K0\  Ai\ /j;  A2;  /2;  •  •  • ;  A*;  /*  is  empty.  In  that  case,  the 
result  of  applying  the  strategy  to  J0  clearly  need  not  achieve  the  goal.  As  we  see 
from  claim  3.14,  the  goal  might  actually  be  attained,  but  this  is  not  guaranteed. 
Thus  the  randomized  strategy  would  signal  failure  of  its  current  attempt,  based  on 
the  recognition  that  it  had  guessed  wrong  initially.  In  short,  there  need  not  be  a 
guaranteed  strategy  for  attaining  the  goal  from  T0. 

The  second  comment  concerns  the  relationship  of  the  corollary  to  Donald’s  work 
on  Error  Detection  and  Recovery  [Don89],  He,  as  we,  was  interested  in  executing  a 
strategy  from  some  large  starting  region,  although  the  strategy  was  only  guaranteed 
to  attain  the  goal  from  some  smaller  subregion.  The  condition  he  placed  on  such  a 
strategy  was  that  it  terminate  by  either  recognizing  goal  attainment  or  recognizing 
attainment  of  a  region  from  which  goal  attainment  is  impossible.  The  situation  in  our 
case  is  slightly  different.  In  particular,  as  we  shall  see,  the  randomized  strategy  will 
actually  loop  over  several  attempts,  on  each  making  a  new  guess  as  to  the  effective 
starting  state.  After  all,  we  have  assumed  that  the  large  starting  region  is  covered 
by  a  union  of  smaller  regions,  for  each  of  which  there  exists  a  guaranteed  strategy. 
This  is  a  more  stringent  requirement  than  that  Donald  asked  of  his  starting  regions. 
Additionally,  whereas  Donald  required  his  strategies  to  either  recognize  success  or 
failure,  we  have  simply  defined  failure  to  be  the  lack  of  success.  Indeed,  it  may 
happen  that  the  strategy  terminates  thinking  it  has  failed  when  in  fact  the  state  of 
the  system  is  inside  the  goal.  Our  only  requirement  is  that  if  the  strategy  thinks 
that  it  has  attained  the  goal,  then  indeed  it  has.  This  is  a  weaker  requirement,  one 
that  is  niuiv.  ^cu^iiv  sa*icfied.  It  is  enough  for  our  purposes,  since  on  each  iteration  ot 
the  randomized  strategy,  there  is  some  non-zero  probability  of  guessing  the  correct 
starting  state,  and  thus  some  non-zero  probability  of  terminating  successfully. 

3.9.5  Repeated  Goal  Reachability 

The  second  issue  that  needs  to  be  addressed  concerns  the  behavior  of  the  randomized 
strategy  upon  failure.3  Thus  far  we  have  merely  asked  that  the  strategy  guess  a 
starting  knowledge  state  and  execute  a  strategy  guaranteed  to  achieve  the  goal  if  the 
guess  is  correct.  If  the  guess  is  incorrect  and  the  strategy  fails  to  achieve  the  goal,  then 
one  needs  to  worry  about  how  to  proceed.  One  possibility  is  that  the  new  resulting 
knowledge  state  at  execution  time  is  one  of  those  for  which  a  guaranteed  strategy 

3  As  before,  failure  can  have  two  meanings,  either  relative  to  the  guessed  starting  region,  or  relative 
to  the  entire  starting  region.  Either  meaning  is  acceptable.  See  section  3.9.2. 
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exists.  In  other  words,  a  non-blank  entry  appears  in  the  dynamic  programming  table 
for  that  knowledge  state.  Another  possibility  is  that  the  new  knowledge  state  is 
the  union  of  several  smaller  knowledge  states  for  which  guaranteed  strategies  exist. 
More,  generally,  however,  there  may  not  be  any  way  to  proceed.  This  leads  to  a 
second  applicability  condition. 

Consider  a  ik-column  dynamic  programming  table.  Suppose  that  the  initial  state 
of  the  system  is  known  to  lie  in  some  subset  To  of  the  state  space.  Assume  as  before, 
that  there  is  a  collection  of  at  most  n  =  |S|  knowledge  states  that  cover  the  set  2q, 
from  each  of  which  there  is  a  guaranteed  strategy  of  k  steps  for  attaining  the  goal. 
Now  let  us  go  one  step  further.  Consider  the  ith  column  of  the  table,  and  define  T>, 
(for  i  =  1, . . . ,  k)  to  be  the  union  of  till  knowledge  states  whose  entries  in  this  column 
are  non-blank.  In  other  words,  V,  is  the  union  of  all  knowledge  states  for  which 
there  exists  a  strategy  of  i  or  fewer  steps  guaranteed  to  attain  the  goal.  (Note  that 
we  have  I0  C  i)k.\  If  ever  the  actual  knowledge  state  A"  is  a  subset  of  the  set  T>i, 
then  it  is  possible  to  guess  between  a  collection  of  knowledge  states  from  which  goal 
attainment  is  possible.  The  guess  involves  at  most  n  choices.  If  it  involves  exactly 
one  choice,  then  the  strategy  is  in  fact  guaranteed  to  attain  the  goal.  In  general, 
one  must  worry  about  false  goal  recognition,  using  now  the  knowledge  state  K  in 
place  of  Iq  in  corollary  3.15.  An  applicability  condition  can  now  be  stated,  which 
simply  says  that  for  all  possible  execution  traces  the  system  always  winds  up  in  one 
of  the  {X\}-  In  other  words,  no  execution  trace  should  ever  enter  a  blank  entry  in  the 
dynamic  programming  table.  This  is  quite  a  difficult  condition  to  state  generally  in 
any  meaningful  way,  partly  because  one  must  now  look  at  execution  traces  that  may 
be  longer  than  k  steps,  and  partly  because  the  false  goal  recognition  condition  enters 
into  the  picture.  Instead  we  will  state  a  weaker  condition,  then  show  how  to  satisfy 
it  with  a  very  simple  assumption. 

Definition.  Recall  that  a  randomized  strategy  repeatedly  guesses  its  initial 
starting  region  K,,  then  executes  some  guaranteed  strategy  for  attaining  the  goal  from 
Ki.  The  execution  terminates  either  with  goal  attainment  or  failure.  We  will  refer  to 
each  such  guess  and  strategy  execution  as  a  single  guessing  loop  of  the  randomized 
strategy. 

Definition.  We  will  say  that  a  randomized  strategy  may  be  reliably  restarted 
if,  whenever  it  fails  to  attain  the  goal  recognizably  on  a  single  guessing  loop,  it 
recognizably  lies  within  its  initial  starting  region  Iq. 

The  following  claim  establishes  a  nice  complement  to  corollary  3.15.  To  verify 
that  the  strategy  may  be  reliably  restarted  in  general  one  of  course  needs  to  check 
the  condition  of  the  claim  for  all  possible  knowledge  states  A,  that  the  randomized 
strategy  might  guess  (recall  there  are  at  most  n  of  them).  The  claim  is  essentially  a 
restatement  of  the  definition  of  reliable  restart,  but  with  a  slightly  stronger  condition. 

Claim  3.16  Assume  the  hypotheses  of  claim  3.14,  ant f  suppose  that  the  guaranteed 
strategy  for  Kq  is  assured  of  reliable  goal  recognition  from  Ko-  The  randomized  strategy 
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may  be  reliably  restarted,  if  and  only  if  To',  A\\I\\  A2;  /2;  •  •  • ;  Ak',  h  Q  Tq  for  all  possible 
execution  traces  that  might  occur  which  fail  to  attain  the  goal  recognizably  and  for 
which  Kq;  j4i;  Ii‘,  A2;  I2',-"',  Ak',  h  is  either  empty  or  contains  non-goal  points. 


3.9.6  Observations  and  Assumptions 

Notice  that  if  a  strategy  both  is  assured  of  reliable  goal  recognition  and  may  be 
reliably  restarted  for  all  relevant  knowledge  states  K,  that  cover  Zq,  then  whenever  a 
single  guessing  loop  of  the  randomized  strategy  is  executed  from  the  region  Tq,  it  is 
guaranteed  to  attain  recognizably  either  the  goal  or  again  the  region  Iq  itself.  This 
condition  is  in  appearance  very  similar  to  Donald’s  EDR  condition  (see  page  100  of 
[Don89]),  which  insists  that  a  strategy  be  guaranteed  to  attain  recognizably  either 
the  goal  or  a  region,  called  the  failure  region,  from  which  success  is  not  possible.  One 
difference  is  that  our  failure  region  is  the  start  region  itself. 

Another  related  difference  is  that  the  condition  does  not  work  in  reverse.  In  other 
words,  the  converse  statement  that  recognizable  attainment  of  the  goal  or  the  start 
region  implies  reliable  goal  recognition  and  reliable  restart  is  simply  not  true.  After 
all,  if  the  start  region  is  the  entire  state  space,  then  any  strategy  is  guaranteed  to 
attain  recognizably  either  the  goal  or  the  start  region,  but  the  strategy  need  not 
satisfy  the  condition  of  reliable  goal  recognition. 

The  failure  of  the  converse  statement  suggests  that  verifying  reliable  goal 
recognition  and  reliable  restart  are  in  general  quite  difficult.  However,  they  are  easily 
satisfiable  conditions  if  we  make  two  special  assumptions. 

Assumption  of  Goal  Recognizability.  First,  we  will  assume  that  the  goal  is 
recognizable  independent  of  any  particular  execution.  This  means  that  if  the  sensor 
signals  goal  attainment  then  the  goal  has  indeed  been  attained,  and  conversely,  if  the 
goal  is  entered  then  the  sensor  will  signal  goal  attainment. 

Assumption  of  Covering  Start  Region.  Second,  we  will  assume  that  the  start 
region  for  any  guessing  strategy  is  the  entire  state  space.  In  general,  one  can  relax 
this  assumption  by  considering  only  that  portion  of  the  state  space  that  might  ever 
be  traversed. 

One  final  comment  is  in  order.  When  the  guessing  strategy  fails  and  decides  to 
guess  anew,  it  need  in  general  not  guess  between  the  q  p  dble  knowledge  states 
that  cover  the  starting  region  To,  but  only  between  those  knowledge  states  that  cover 
the  new  start  region  Tq  =  Z0;  A2; 7t;  A2\  I2,-',  Ak ;  h  determined  by  the  most  recent 
execution  trace.  This  can  sometimes  speed  up  convergence.  In  particular,  if  T'0  is 
actually  equal  to  one  of  the  knowledge  states  for  which  a  guaranteed  strategy  exists, 
then  the  randomized  strategy  is  assured  of  convergence  on  the  next  attempt. 
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3.10  Comparison  of  Randomized  and 
Guaranteed  Strategies 


Suppose  one  is  in  the  fortunate  situation  of  having  both  a  guaranteed  strategy  for 
attaining  a  goal  and  a  randomized  strategy  of  the  type  just  discussed.  One  question 
is  whether  it  ever  makes  sense  to  use  the  randomized  strategy.  The  answer  is 
yes,  assuming  that  the  expected  convergence  time  for  the  randomized  strategy  is 
significantly  less  than  the  convergence  time  for  the  guaranteed  strategy.  In  order  to 
set  up  this  comparison,  let  us  suppose  that  the  guaranteed  strategy  for  the  starting 
state  T0  is  found  in  the  Ph  column  of  the  dynamic  programming  table,  and  let  us 
suppose  that  the  guessing  strategy  is  found  in  the  kth  column.  Assume  that  there  are 
q  knowledge  states  K\ , . . . ,  Kq  between  which  the  randomized  strategy  guesses,  and 
suppose  that  the  guaranteed  strategies  for  these  states  converge  in  steps  ki,...,kq, 
respectively.  In  other  words,  the  worst-case  convergence  time  for  the  guaranteed 
strategy  for  J0  requires  l  steps,  and  the  worst-case  convergence  time  for  I\,  requires 
kt  steps  ( i  =  1, . . .  ,q). 

If  we  assume  that  the  randomized  strategy  always  guesses  between  all  possible 
q  states,  then  the  expected  time  until  convergence  is  bounded  by  0k,,  which  in 
turn  is  bounded  by  q  k.  It  is  a  little  strange  mixing  these  expected  and  worst-case 
times,  but  the  idea  is  similar  to  the  example  involving  random  key  selection  in  the 
introduction.  Essentially,  if  qk  is  on  the  order  of  t ,  or  larger,  then  it  doesn’t  make 
much  sense  to  use  the  randomized  strategy.  However,  if  q  k  is  considerably  less  than 
t  then  it  is  probably  a  good  idea  to  use  the  randomized  strategy.  In  particular,  if  l 
is  exponentially  large  in  the  problem  specification,  and  k  is  only  polynomially  large, 
then  it  always  makes  sense  to  use  the  randomized  strategy.  This  is  because,  as  we 
noted  early  in  the  chapter,  the  probability  that  the  randomized  strategy  will  require 
more  than  t  attempts  is  less  than  •  Recall  also  that  q  is  bounded  by  n.  It 

follows  that  for  fixed  n,  the  strategy  converges  exponentially  fast  in  the  number  of 
steps.  One  may  worry  that  as  n  gets  large  q  may  also  get  large,  in  which  case,  (q  —  \)/q 
approaches  unity.  This  seems  to  imply  that  as  n  gets  large  one  cannot  guarantee  fast 
convergence.  Notice,  however,  that  if  t  >  m  q,  where  m  is  some  integer  and  q  is  large, 
then  the  probability  of  the  randomized  strategy  requiring  more  than  t  steps  is  less 
than  e-m,  so  convergence  is  still  fast.  In  particular,  in  quadratic  time  the  probability 
of  failure  can  be  made  exponentially  small. 

As  an  aside,  consider  how  randomization  by  guessing  relates  to  the  labelling 
scheme  discussed  earlier  (see  section  3.5).  Essentially  all  non-goal  states  are  assigned 
the  same  label,  namely  the  number  k ,  while  goal  states  are  assigned  the  label  zero. 
Then  the  expected  velocity  at  all  non-goal  states  is  at  least  — 1  /q,  when  averaged  over 
each  step  of  a  &-step  strategy,  and  thus  the  expected  convergence  time  is  bounded 
by  kq.  In  some  sense,  by  considering  composite  steps  consisting  of  k  basic  steps,  we 
have  transformed  a  non-deterministic  problem  into  a  two-state  probabilistic  problem. 
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3.11  Multi-Guess  Randomization 

Thus  far  we  have  only  dealt  with  randomization  by  guessing  the  starting  state  of  the 
system.  In  general,  it  is  equally  possible  to  consider  sequences  of  several  guesses. 
In  other  words,  when  executing  a  strategy,  at  some  point  a  knowledge  state  is 
encountered  that  is  the  union  of  several  smaller  knowledge  states.  Instead  of  executing 
a  strategy  applicable  to  the  larger  knowledge  state,  a  system  could  simply  guess 
between  the  smaller  states,  then  use  strategies  appropriate  for  each  of  these.  In  terms 
of  planning,  the  standard  preimage  or  dynamic  programming  approaches  continue  to 
apply,  but  with  an  additional  operator.  Call  this  operator  SELECT.  SELECT  operates 
as  follows. 


An  Augmented  Dynamic  Programming  Table 

First,  let  us  augment  the  dynamic  programming  table.  Each  column  in  the  dynamic 
programming  table  will  contain  three  types  of  entries,  namely  BLANK,  GUARANTEED, 
and  RANDOMIZED.  The  intuition  is  that  BLANK  and  GUARANTEED  are  as  before. 
Specifically  if  the  entry  for  a  knowledge  state  K  is  a  GUARANTEED  entry  then  there 
exists  a  tree  of  actions  that  is  guaranteed  to  attain  the  goal  assuming  that  the  initial 
state  was  indeed  inside  K.  A  BLANK  entry  implies,  as  before,  that  there  is  no  such 
strategy,  and,  now,  also  that  there  is  no  strategy  involving  random  choices.  The 
RANDOMIZED  label  in  the  entry  for  a  knowledge  state  K  means  that  there  is  a  tree 
of  operations  that  has  some  probability  of  attaining  the  goal.  The  operations  involve 
both  standard  non-deterministic  actions  and  the  guessing  operator  SELECT.  It  is 
sometimes  also  useful  to  distinguish  between  different  RANDOMIZED  entries  based  on 
the  probability  of  success  of  attaining  the  goal  by  a  particular  sequence  of  guessing 
operations.  For  a  given  knowledge  state,  this  number  is  easily  computed  as  the 
minimum  product  of  guessing  probabilities  along  possible  paths  from  that  knowledge 
state  to  the  goal.  The  probability  represents  the  worst-case  probability  of  attaining 
the  goal  by  a  sequence  of  actions  and  guessing  operations.  It  does  not  take  into 
account  goal  attainment  that  is  possible  even  when  a  guess  is  wrong.  For  this  reason 
the  probability  may  considerably  underestimate  the  actual  probability  of  success,  and 
places  into  question  its  utility.  Nonetheless,  in  some  situations  these  probabilities 
provide  a  useful  lower  bound  for  comparing  different  strategies. 

Planning 

And  now  for  the  augmented  planning  process.  Suppose  that  the  planner  has 
backchained  to  the  kth  column  of  the  dynamic  programming  table,  and  is  currently 
considering  the  k  +  1“  column.  First  the  planner  fills  in  all  entries  using  only  the 
standard  non-deterministic  actions.  In  other  words,  for  each  knowledge  state  K,  if 
there  is  an  action  A  of  the  form  A  :  K  *-»  K\, . . . ,  Kj,  and  each  of  the  Ki  has  a  non- 
BLANK  entry  in  the  kth  column,  then  the  entry  for  K  in  the  fc+l*‘  column  may  be  taken 
to  be  A.  If  there  are  several  such  actions  A,  then  one  may  wish  to  distinguish  between 
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different  actions  by  considering  the  labels  of  the  entries  for  the  knowledge  states  A', 
to  which  the  action  can  transit.  In  particular,  suppose  RANDOMIZED  entries  actually 
have  probabilities  of  success  associated  with  them.  Then  it  makes  sense  to  assign 
the  probability  0  to  any  BLANK  entry,  and  the  probability  1  to  any  GUARANTEED 
entry.  One  can  then  associate  with  each  action  A  a  worst-case  probability  of  success 
(but  recall  that  this  may  be  an  underestimate).  Specifically,  if  p,-  is  the  probability 
of  success  associated  with  the  knowledge  entry  for  Ki  in  the  kth  column,  then  the 
probability  of  success  p ^  for  A  may  be  taken  as  min,{p,}.  If  several  actions  A  are 
applicable  at  the  current  knowledge  state,  one  can  then  select  that  action  which 
maximizes  p^.  In  particular,  if  there  is  an  action  that  only  transits  to  GUARANTEED 
states,  then  the  planner  should  select  it.  Similarly,  if  all  actions  have  worst-case 
probability  zero  of  success,  then  the  planner  should  simply  leave  the  entry  for  K 
BLANK.  Once  an  action  has  been  selected,  it  provides  a  label  and/or  a  probability  of 
success  for  the  current  knowledge  state  K. 

Once  the  entries  in  the  k  +  1J<  column  have  been  filled  in  in  this  way,  the  planner 
next  considers  all  remaining  BLANK  entries  in  that  column.  In  particular  suppose  K 
is  a  knowledge  state  whose  entry  is  BLANK.  If  the  knowledge  state  can  be  written  as 
a  finite  union  of  non-BLANK  states  {A'i , . . . ,  Kq},  then  the  SELECT  operator  comes 
into  play.  It  provides  a  transition  from  K  to  one  of  the  A',  via  randomization.  The 
entry  for  K  in  the  k  +  1**  column  becomes  a  RANDOMIZED  entry,  with  worst-case 
probability  of  success  given  by  ^  min;  {pi},  where  p,  is  the  worst-case  probability  of 
success  for  state  K,  in  the  k  +  lat  column.  Again,  the  planner  may  wish  to  use  SELECT 
to  point  from  A'  to  a  collection  {A',}  of  minimal  size,  or  perhaps  to  a  collection  that 
maximizes  tne  worst-case  probability  of  success. 

As  usual,  one  must  ensure  that  reliable  goal  recognition  and  reliable  restart  are 
possible. 

Execution 

At  run-time,  suppose  nominally  there  are  k  steps  remaining  and  the  current  knowledge 
state  is  K.  If  the  entry  for  K  is  BLANK,  then  execution  of  this  particular  guessing 
loop  stops,  and  a  new  loop  at  the  beginning  of  the  table  is  restarted,  if  possible.  If  the 
entry  for  K  is  not  BLANK,  but  contains  an  action  A,  then  the  system  executes  that 
action,  thereby  proceeding  to  the  A:  —  l*f  column.  If  the  entry  for  K  is  RANDOMIZED 
and  thus  contains  a  SELECT  operation,  then  the  system  randomly  chooses  one  of  the 
{A',}  specified  by  this  SELECT  operation,  whereupon  the  action  stored  in  the  entry 
for  the  selected  A-,  is  executed.  Il  ever  the  goal  is  attained,  execution  stops.  Starting 
or  restarting  the  guessing  loop  entails  determining  an  initial  knowledge  state  by 
performing  a  sensory  operation  and  intersecting  the  resulting  sensory  interpretation 
set  with  the  set  I0,  in  which  all  motions  are  assumed  to  occur.  An  alternative  is  to 
restart  the  guessing  loop  by  considering  the  set  2J+1  =  I'0'y  Ai\ /j;  A?\  J2; •  •  • ;  A*;  J*  C 
2o  in  place  of  lo,  where  is  the  initial  knowledge  state  at  the  start  of  the  ith  iteration 
of  the  guessing  loop.  This  procedure  preserves  full  history  independent  of  any  guesses, 
and  thereby  may  limit  the  number  of  states  between  which  the  strategy  must  guess 
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on  each  new  iteration. 


Examples 

For  an  example  of  multi-level  guessing  in  the  continuous  domain,  see  the  example 
of  figure  4.8  on  page  219.  For  a  simpler  example  consider  again  the  discrete 
approximation  to  the  peg-in-hole  problem  of  figure  2.13  on  page  85.  Suppose  that 
the  peg  does  not  just  fall  into  the  hole  once  it  is  above  the  hole.  Instead,  the  system 
first  must  ascertain  that  the  peg  is  above  the  hole,  then  try  to  push  down.  If  sensing 
is  poor  so  that  the  system  cannot  decide  on  which  side  of  the  hole  the  peg  is  located, 
then  the  system  may  have  to  resort  to  a  multi-level  guessing  strategy.  In  particular, 
the  system  first  guesses  on  which  side  of  the  hole  the  peg  is  located,  then  moves  in  the 
goal  direction  specified  by  this  guess.  Next,  the  system  repeatedly  guessc  .'hether  it 
has  moved  the  peg  above  the  hole,  and  either  pushes  downward  if  it  guesses  “yes",  or 
continues  its  motion  if  it  guesses  “no”.  If  the  system  guesses  correctly  each  time,  then 
the  peg  will  enter  the  hole.  Let  us  assume  that  this  success  is  recognized  by  some 
other  means  (for  example,  by  considering  the  height  of  the  peg  above  the  hole).  One 
could  imagine  removing  the  second  set  of  guesses  in  this  strategy,  and  instead  always 
pushing  down  after  each  move.  If  this  is  feasible  it  will  be  generated  as  a  strategy  by 
the  dynamic  programming  approach.  However,  perhaps  pushing  down  disturbs  some 
other  parameter  of  the  system  whenever  the  peg  is  not  above  the  hole.  For  instance, 
if  the  peg  is  gripped  by  a  robot  hand,  the  fingers  might  slide,  and  the  peg  might  have 
to  be  regrasped  from  some  initial  configuration.  In  this  case  it  might  be  better  not  to 
push  down  after  each  attempt.  Another  possibility  is  that  there  are  multiple  holes, 
so  that  pushing  the  peg  down  into  the  wrong  hole  requires  extracting  it  again.  In 
any  event,  both  types  of  strategies  may  be  generated  by  the  dynamic  programming 
approach. 


Randomization  Can  Solve  Nearly  Any  Task 

Once  one  has  an  operator  such  as  SELECT,  one  can  solve  any  task  for  which  there 
is  some  chance  of  attaining  the  goal!  As  usual,  this  assumes  goal  recognizability  and 
reliable  restart.  In  order  to  see  that  any  problem  is  solvable,  first  recall  claim  3.4. 
This  claim  tells  us  that  whenever  it  is  “certainly  possible”  to  move  from  any  state 
to  the  goal,  then  there  actually  exists  a  guaranteed  strategy  for  attaining  the  goal, 
assuming  a  perfect-sensing  function.  Furthermore,  this  strategy  requires  at  most 
r  =  |5|  —  \Q\  steps.  A  guessing  strategy  may  thus  be  constructed.  The  strategy 
simulates  the  perfect  sensor  by  guessing  the  actual  state  of  the  system  at  each  step  of 
the  perfect-sensing  strategy,  before  deciding  on  the  next  action  to  execute.  Of  course, 
the  worst-case  expected  execution  time  of  such  a  randomized  strategy  may  be  quite 
bad.  In  particular,  the  probability  of  guessing  the  state  correctly  during  all  stages 
of  an  r-step  strategy  may  he  on  the  order  of  1/r!.  Thus  the  worst-case  expected 
execution  time  is  0(rr!). 
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3.12  Comments  and  Extensions 

3.12.1  Randomization  in  Probabilistic  Settings 

The  knowledge  states  in  the  probabilistic  setting  are  probability  distributions  on  the 
underlying  state  space.  In  other  words,  each  knowledge  state  is  an  ordered  n-tuple  of 
non-negative  numbers  that  add  up  to  one,  where  n  =  |5|. 

If,  as  we  have  assumed,  all  of  the  underlying  states  are  connected  to  the  goal, 
then  for  each  state  one  can  determine  a  sequence  of  at  most  r  transitions  leading 
from  the  state  to  the  goal.  Here  r  is  the  number  of  non-goal  states,  as  usual.  The 
probability  of  actually  executing  this  sequence  is  at  least  equal  to  the  product  of  the 
probabilities  along  each  of  the  arcs.  For  each  state  one  can  easily  determine  (using 
Dijkstra’s  algorithm)  a  sequence  of  transitions  of  maximum  probability.  A  randomized 
strategy  of  the  flavor  discussed  for  the  non-deterministic  case  would  consist  of  guessing 
the  underlying  start  state  of  the  system,  then  executing  a  sequence  of  actions 
corresponding  to  the  sequence  of  transitions  thus  determined.  The  probability  of 
attaining  the  goal  is  then  at  least  equal  to  the  probability  of  guessing  the  correct 
start  state,  multiplied  by  the  probability  of  actually  executing  the  sequence  leading 
from  that  state  to  the  goal.  This  probability  is  bounded  from  below  by  £  pT ,  where 
p  is  the  smallest  probability  appearing  on  any  arc  in  the  r  sequences  of  transitions 
leading  to  the  goal.  This  number  may  be  quite  small  in  general.  Of  course,  if  there 
exists  a  guaranteed  strategy  for  attaining  the  goal,  assuming  perfect  sensing,  then 
there  exists  a  guessing  strategy  just  as  for  the  non-deterministic  case  above.  For  both 
types  of  randomized  strategies,  it  is  assumed  that  the  goal  is  reliably  recognizable. 

In  general,  however,  if  one  has  probabilities  available  for  the  actions  and  sensors, 
then  it  does  not  make  much  sense  to  randomize  in  the  way  one  might  do  for  the  non- 
deterministic  case.  In  particular,  the  probability  of  executing  a  sequence  of  transitions 
from  a  state  to  the  goal  is  often  a  severe  underestimate  of  the  actual  probability  of 
attaining  the  goal.  This  was  made  clear  by  the  examples  on  random  walks.  Instead  of 
constructing  strategies  that  randomize  by  guessing,  it  is  generally  more  useful  either 
to  construct  strategies  that  make  local  progress  or  to  solve  the  complete  Markov 
Decision  Problem  and  try  to  minimize  the  expected  time  to  attain  the  goal. 

There  is  one  special  form  of  randomization  that  does  appear  fairly  directly  in 
the  probabilistic  setting.  This  consists  of  moving  the  state  of  the  system  in  order 
to  change  the  probability  distribution  over  the  state  space,  say  to  equalize  it.  This 
randomization  is  useful  for  some  tasks  where  it  is  desired  to  meet  some  action's 
preconditions  at  least  probabilistically.  The  main  purpose  of  this  randomization  in 
the  domain  of  manipulation  is  to  blur  environmental  details.  A  natural  setting  is 
in  tasks  that  involve  geometric  uncertainty.  An  example  is  given  by  a  peg-in-hole 
problem  in  which  the  location  of  the  hole  is  not  modelled  accurately.  By  randomizing 
the  peg’s  position  near  the  hole,  a  robot  can  in  many  cases  ensure  that  the.e  is  a  non¬ 
zero  probability  of  starting  from  a  location  from  which  goal  attainment  is  possible. 

The  parts-sieving  example  of  chapter  1  tried  to  make  a  similar  point.  In  that 
example  the  geometric  uncertainty  was  in  the  exact  shape  and  size  of  the  sieve 
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elements. 

In  general,  given  an  action  (or  composite  action  consisting  of  several  actions),  that 
is  to  be  repeated  over  and  over,  one  can  determine  the  steady  state  distribution  over 
the  state  space  using  the  theory  of  Markov  chains  as  discussed  in  section  3.4.1.  One 
can  compare  different  actions  in  terms  of  the  final  distribution  attained,  and  in  terms 
of  the  expected  time  until  steady  state  is  achieved. 

In  some  cases,  the  actions  required  to  attain  a  particular  randomization  may  be 
clear  from  context.  For  instance,  in  order  to  achieve  a  uniform  distribution  over  a 
bounded  one-din  ensional  lattice,  it  suffices  to  perform  a  standard  one-dimensional 
random  walk,  with  reflection  at  either  ends  of  the  lattice.  There  has  been  considerable 
work  on  estimating  the  time  required  for  convergence  to  a  uniform  distribution  for 
random  walks  on  lattices  (see  for  instance  the  article  on  card-shuffling  [AD]).  Related 
work  dealing  with  random  walks  on  graphs  includes  [GJ],  [AKLLRj,  [SJ],  [CRRST], 
and  [Z]. 

3.12.2  Randomization: 

State-Guessing  versus  State-Distribution 

The  previous  sections  have  indicated  how  a  system  can  probabilistically  attain  a 
goal  by  randomly  choosing  between  several  guaranteed  strategies,  whose  applicability 
conditions  individually  cannot  be  met,  but  which  are  met  when  taken  as  a  disjunctive 
collection.  This  form  of  randomization  has  a  different  flavor  than  the  randomization 
indicated  in  the  early  sections  of  the  chapter,  namely  in  the  gear-meshing  and  parts- 
sieving  examples  (see  also  section  1.2).  In  those  tasks,  there  was  a  single  action 
that  would  attain  the  goal,  given  that  the  action’s  pre-conditions  were  met.  The  pre¬ 
conditions  could  not  be  satisfied  with  certainty,  but  could  be  satisfied  probabilistically 
by  randomly  moving  the  system  about,  such  as  by  twirling  the  gears  or  shaking 
the  sieve.  The  randomization  in  these  cases  seems  more  direct,  since  it  actually 
randomizes  the  state  of  the  system,  than  does  the  randomization  achieved  via 
guessing.  However,  these  two  forms  of  randomization  are  actually  very  similar.  In 
particular,  suppose  that  some  knowledge  state  K  is  a  precondition  to  action  A,  where 
action  A  is  guaranteed  to  achieve  the  goal  Q.  Now  suppose  that  the  initial  state  of  the 
system  is  known  only  to  lie  in  some  set  I0  that  contains  K.  The  state- distribution 
approach  consists  of  randomizing  the  states  within  1q,  so  that  there  is  some  non¬ 
zero  probability  of  actually  being  in  the  set  K.  [If  it  is  true  equalization,  then  that 
probability  is  |A'j/|Z0|.]  This  means  in  particular  that  it  is  “certainly  possible”  to 
reach  K  from  any  state  in  Zo  —  K.  Thus  there  must  be  a  perfect-sensing  strategy  for 
attaining  K ,  and  hence  a  randomization  by  guessing  strategy  for  attaining  £,  from 
any  point  in  Iq.  [As  usual,  it  is  assumed  that  the  goal  is  recognizable  reliably  and  that 
the  guessing  strategy  may  be  restarted  reliably.]  Conversely,  suppose  that  there  exists 
a  guessing  strategy  for  attaining  K.  Then  in  some  sense  there  exists  a  strategy  that 
randomizes  the  state  of  the  system.  After  all,  if  one  considers  all  possible  guesses 
in  the  guessing  strategy,  these  define  a  random  collection  of  action  sequences  that 
randomize  the  state  of  the  system.  However,  it  need  not  be  the  case  that  there  is  a 
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well-defined  distribution  over  Jo,  nor  that  all  states  of  2o  are  necessarily  reachable. 

More  generally,  the  set  K  may  not  be  known,  of  course,  which  is  why  true 
randomization  via  state-motion  may  be  required.  Formally,  however,  this  presents 
no  problem  in  drawing  a  connection  between  randomization  via  state-distribution 
and  randomization  via  state-guessing.  This  is  because  one  can  often  augment 
the  underlying  state  space  with  an  extra  dimension  that  encodes  the  parameters 
whose  unknown  values  define  K.  See  [Don89]  for  further  details  on  handling  model 
uncertainty.  In  other  words,  one  may  not  know  whether  it  is  possible  to  get  from 
some  state  to  the  goal  under  some  action,  so  sometimes  one  guesses  that  it  is  possible 
and  executes  the  action,  whereas  at  other  times  one  guesses  that  it  is  not  possible, 
and  instead  moves  to  a  completely  different  state. 


3.12.3  Feedback  Randomization 

In  the  previous  guessing  strategies  extensive  use  was  made  of  history.  Certainly 
history  plays  a  major  role  within  each  of  the  guaranteed  strategies.  Indeed,  new 
knowledge  states  are  formed  from  old  ones  by  forward  projecting  the  effect  of  actions, 
then  intersecting  these  with  sensory  interpretation  sets.  Similarly,  each  time  the 
guessing  strategy  randomly  selects  a  particular  knowledge  state,  it  is  effectively 
assuming  a  particular  history.  All  actions  following  this  random  selection  update 
knowledge  states  in  the  usual  manner,  so  that  the  derived  history  is  correct  in  so  far 
as  the  random  selection  was  correct. 

The  process  of  guessing  history  can  be  extremely  useful  when  a  strategy  depends 
on  extensive  history  to  prune  possible  sensory  interpretations.  If  sensing  uncertainty 
is  large,  it  might  otherwise  never  be  possible  to  select  the  correct  motions  to  perform. 
By  guessing  some  of  this  history,  goal  attainment  is  possible,  at  least,  if  the  guess  is 
correct.  On  the  other  hand,  in  some  cases,  if  the  guess  is  incorrect,  it  may  take  several 
steps  of  execution  before  an  inconsistency  is  detected  or  before  failure  to  attain  the 
goal  terminates  the  loop.  In  particular,  in  the  case  of  no  sensing  (except  for  goal 
recognition),  a  guaranteed  strategy  that  has  been  randomly  selected  may  have  to  run 
its  full  course  before  the  system  can  recognize  goal  failure.  For  instance,  imagine 
that  one  has  the  diagram  for  a  maze  in  a  cave,  but  is  blindfolded  (and  not  allowed 
to  purposefully  feel  one’s  way  along  the  walls  of  the  cave).  So  sensing  is  very  limited. 
Suppose,  however  that  one  can  turn  fairly  accurately  and  can  measure  distance  by 
walking  fairly  accurately,  so  that  one  can  actually  follow  the  map  well,  based  purely 
on  dead  reckoning.  In  other  words,  control  and  thus  history  are  very  good.  Thus,  if 
one  knows  one’s  starting  position  or  can  guess  it  fairly  accurately,  then  one  has  a  good 
chance  of  getting  out  of  the  cave  quickly,  whereas  if  one  can  only  guess  one’s  starting 
location  with  enormous  uncertainty,  then  the  time  required  may  be  proportional  to 
the  size  of  the  cave  times  the  time  required  to  execute  a  single  attempt  to  exit  the 
cave. 
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Using  Current  Sensed  Information  Only 

An  alternative  to  retaining  history  in  updating  the  knowledge  state  after  each  motion 
is  to  simply  use  the  state  of  knowledge  returned  by  the  current  sensory  value.  More 
generally,  the  constraints  imposed  by  one’s  hardware  or  timing  considerations  may 
require  that  one  design  strategies  whose  actions  are  based  solely  on  current  sensed 
values,  and  not  on  history.  For  this  reason  it  is  natural  to  consider  approaches  for 
synthesizing  simple  feedback  loops.4  Consider  the  representation  of  actions.  Suppose 
that  the  effect  of  an  action  A  on  knowledge  state  K\  is  K2,  and  that  the  range  of 
possible  sensory  interpretation  sets  associated  with  K2  is  E (K2)  =  Lhe/fa  E(s)  = 
{/j ,  I2,  •  •  • ,  /*}.  In  the  framework  developed  thus  far,  one  models  the  induced  action 
A  as 


A  :  K'i  ~ 

where  K2  =  h'2  f|  In  a  framework  without  history  one  models  the  action  simply  as 


A  :  A]  i— ►  7, ,  /2, . . . ,  }(. 

The  first  expression  models  history,  the  second  only  models  possible  sensing 
information.  Thus  the  only  knowledge  states  that  are  relevant  are  those  corresponding 
to  possible  sensory  interpretation  sets. 

Clearly,  fewer  tasks  are  solvable  in  a  guaranteed  sense  with  this  type  of  approach, 
since  it  is  in  general  more  difficult  to  constrain  the  apparent  state  of  the  system. 
From  a  probabilistic  point  of  view,  solvability  has  not  changed.  This  is  because  the 
existence  of  a  randomized  strategy  depends  only  on  goal  reachability,  a  condition 
that  may  be  checked  by  determining  whether  a  perfect-sensing  strategy  exists.  For 
a  perfect  sensor,  history  adds  no  extra  information.  Of  course,  once  one  tries  to 
simulate  the  perfect-sensing  strategy  using  an  actual  sensor  and  a  guessing  strategy, 
the  quality  of  one’s  knowledge  states  determines  the  expected  time  until  the  goal 
is  attained.  For  a  purely  sensor-based  system,  that  is,  a  system  without  history, 
although  all  tasks  are  still  solvable  probabilistically,  the  expected  convergence  time 
will  in  general  increase. 

As  an  example  consider  a  simple  peg-in-hole  problem.  Either  the  two-dimensional 
peg-in-hole  of  figure  2.2  or  the  abstraction  of  the  three-dimensional  peg-in-hole 
discussed  in  section  1.1  are  possible  examples.  A  perfect- sensing  strategy  might 
consist  of  moving  straight  towards  the  hole.  However,  if  there  is  sensing  uncertainty 
and  the  system  does  not  retain  history,  then  it  will  become  confused  near  the  hole. 
Instead  of  relying  on  accurate  information,  the  system  effectively  must  guess  where 
it  is  located.  This  manifests  itself  in  the  execution  of  a  random  action.  The 
difference  between  history-based  and  simple  feedback  loops  is  particularly  striking 
in  the  example  of  figure  2.2.  In  this  example  the  motions  are  one-dimensional. 
Thus  a  randomized  strategy  that  retained  history  could  simply  make  a  single  guess, 

4 Simple  feedback  refers  to  the  feedback  of  current  sensed  values  without  retaining  past  sensed 
values. 
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executing  a  long  motion  to  attain  the  goal.  Should  failure  occur,  the  strategy  would 
then  possess  enough  information  to  direct  it  towards  the  goal  accurately.  However, 
a  simple  feedback  loop  that  does  not  retain  history  would  make  repeated  guesses, 
effectively  executing  a  random  walk  on  one  of  the  edges  near  the  hole  until  it  attained 
the  goal.  Thus  in  this  example  the  difference  between  retaining  history  and  only 
considering  current  sensory  information  manifests  itself  as  the  difference  between 
linear  and  quadratic  expected  convergence  times. 

Using  the  full  sensory  interpretation  set  at  each  step  rather  than  intersecting  it 
with  past  history  has  at  least  one  desirable  characteristic,  namely  it  preserves  truth. 
In  contrast,  a  guessing  strategy  that  assumes  a  particular  history  need  not  preserve 
truth.  Indeed  the  truth  is  fudged  in  order  to  provide  a  minimum  probability  of  success. 
However,  in  some  cases,  namely  those  in  which  an  adversary  cannot  force  indefinite 
failure,  and  in  which  progress  towards  the  goal  is  possible  on  average,  a  feedback 
loop  based  on  current  sensed  values  can  provide  reasonable  convergence  times  while 
preserving  accurate  knowledge  at  each  step  of  the  strategy. 

Progress  Measures 

One  nice  property  of  a  perfect-sensing  plan  is  that  it  places  an  implicit  progress 
measure  on  the  underlying  state  space.  This  was  made  explicit  by  claim  3.12.  Such 
a  simple  progress  measure  on  the  underlying  state  space  is  not  as  easily  provided  by 
plans  that  involve  general  knowledge  states,  simply  because  a  state  may  be  a  member 
of  several  different  knowledge  states  that  have  different  labellings.  Only  in  the  perfect- 
sensing  case  is  there  necessarily  a  unique  labelling  of  states.  This  labelling  plays  the 
same  role  that  the  duration  labels  did  in  the  Markov  chain  case.  We  observed  earlier 
that  the  Markov  chain  model  applied  even  in  the  imperfect-sensing  case,  so  long  as 
the  action  taken  at  any  time  was  solely  a  probabilistic  function  of  the  state  of  the 
system  (in  particular,  time  and  history  invariant).  A  similar  statement  applies  in  the 
non-deterministic  case,  so  that  it  makes  sense  to  think  about  progress  measures  even 
with  imperfect  sensors.  We  alluded  to  this  in  section  3.6,  but  now  is  a  good  time 
to  take  a  closer  look.  The  discussion  should  tie  together  the  concepts  of  progress 
measures  and  randomization  by  guessing  in  the  setting  of  strategies  that  rely  purely 
on  current  sensory  feedback  and  not  on  history. 

Feedback  with  Progress  Measures 

Suppose  the  collection  {«S)}*_0  is  given  as  per  claim  3.12  for  some  discrete  planning 
problem  with  non-deterministic  actions.  Let  the  label  for  each  state  s  simply  be  the 
index  j  of  the  unique  set  Sj  that  contains  the  state  s.  Define,  as  in  section  3.6,  the 
worst-case  velocity  va,»  relative  to  some  action  A  at  some  state  s  to  be  the  maximum 
possible  change  in  labellings,  where  the  sign  of  the  change  is  significant.  An  action  is 
said  to  make  progress  at  a  state  s  precisely  when  v a,,  is  negative. 

Now  consider  how  a  simple  feedback  strategy  operates.  At  any  instant  it  has 
available  some  sensory  interpretation  set  7.  Given  this  sensory  information  the 
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strategy  executes  some  action  A.  We  are  assuming  that  the  choice  of  action  A  depends 
only  on  the  sensory  information  and  not  on  any  hidden  state  variables  that  encode 
history  or  the  passage  of  time.  Thus  A  is  either  uniquely  determined  by  I  or  chosen 
randomly  from  a  collection  of  actions  that  is  uniquely  determined  by  I. 

If  one  wishes  to  ensure  that  at  each  step  the  strategy  makes  progress  relative 
to  some  labelling,  then  it  must  be  the  case  that  all  non-goal  states  s  E  I  transit 
to  a  state  with  lower  label  when  A  is  executed,  and  all  goal  states  remain  in  the 
goal.  This  in  turn  implies  that  there  is  actually  a  guaranteed  strategy  for  attaining 
the  goal,  assuming  that  the  goal  is  reliably  recognizable  once  entered.  Furthermore, 
the  strategy  converges  in  no  more  than  f  steps,  where  t  is  the  highest  possible  label 
assigned  to  a  state.  The  guaranteed  strategy  operates  simply  by  executing  that  action 
A  that  ensures  progress  for  all  states  in  /,  whenever  the  sensory  interpretation  set  is 
I. 

Planning  Limitations 

Before  we  comment  on  the  generality  of  this  approach,  let  us  observe  that  even 
though  there  exists  a  guaranteed  strategy  whenever  progress  is  ensured  for  all  possible 
sensory  interpretation  sets,  it  need  not  be  the  case  that  a  planner  that  only  considers 
knowledge  states  corresponding  to  sensory  interpretation  sets  can  actually  construct 
this  strategy.  This  is  because  some  notion  of  history  is  required  in  order  to  recognize 
convergence  of  the  strategy,  even  though  the  strategy  itself  does  not  make  use  of 
history.  In  particular,  the  planner  may  not  be  able  to  synthesize  the  relevant  progress 
measure. 

For  a  simple  example,  consider  figure  3.18.  There  are  four  states  and  two  actions. 
Action  A\  is  guaranteed  to  move  state  Sj  to  the  goal  sc,  while  it  moves  state 
s3  non-deterministically  either  to  state  S!  or  state  s2.  It  leaves  all  other  states 
unchanged.  Similarly,  action  A2  is  guaranteed  to  move  state  s2  to  the  goal,  while 
non-deterministically  moving  state  s3  to  either  Si  or  s2.  Suppose  that  the  set  of 
ser  r  values  is  given  by  three  possible  interpretation  sets,  namely  {si,s3},  {s2,s3}, 
and  {sc}-  So  goal  recognizability  is  ensured. 

This  example  might  be  an  abstract  version  of  the  two-dimensional  peg-in-hole 
problem  of  figure  2.2,  with  an  additional  state  corresponding  to  the  placement  of  the 
peg  in  free  space.  The  analogous  sensing  would  be  to  assume  that  the  system  can 
distinguish  on  which  side  of  the  hole  the  peg  is  located,  but  that  the  system  cannot 
decide  whether  the  peg  has  made  contact  with  a  surrounding  edge,  as  opposed  to 
being  in  free-space  above  the  hole. 

A  guaranteed  simple  feedback  strategy  for  attaining  the  goal  is  of  the  form: 

•  If  the  sensory  interpretation  set  is  {si,s3},  then  execute  action  A\. 

•  If  the  sensory  interpretation  set  is  {s2,s3},  then  execute  action  A2. 

•  If  the  sensory  interpretation  set  is  {sg}i  then  terminate  successfully. 
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Figure  3.18:  For  this  state  diagram  and  the  collection  of  possible  sensory 
interpretation  sets,  there  exists  a  guaranteed  strategy  for  attaining  the  goal. 
Furthermore,  the  strategy  does  not  require  history  to  execute.  However,  a 
backchaining  planner  that  ignores  history  cannot  generate  the  strategy.  [The  sensory 
interpretation  sets  are  indicated  by  rectangles  surrounding  the  states.) 
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The  strategy  is  guaranteed  to  succeed  because  at  each  step  it  ensures  progress  relative 
to  a  progress  measure  that  labels  sg  with  0,  Si  and  s2  with  1,  and  s3  with  2.  Observe, 
however,  that  if  a  planner  only  considered  the  three  knowledge  states  given  by  the 
sensory  information,  then  it  could  not  backchain  even  one  level.  This  is  because, 
for  example,  there  is  no  action  that  guarantees  that  the  knowledge  state  {sl5s3} 
is  transformed  into  either  of  the  other  two  knowledge  states.  Of  course,  a  planner 
that  made  full  use  of  history  would  be  able  to  synthesize  a  guaranteed  strategy  for 
attaining  the  goal. 

More  generally,  consider  a  strategy  that  only  uses  current  sensory  feedback  at 
execution  time,  but  is  guaranteed  to  converge  to  a  goal  because  it  is  assured  of  local 
progress  relative  to  some  labelling.  Then  there  need  not  be  a  solution  visible  to  a 
planner  that  only  considers  knowledge  states  that  are  possible  sensory  interpretation 
sets,  but  there  always  will  be  a  solution  visible  to  a  planner  that  considers  full 
history  and  sensing  information  (this,  in  the  preimage  setting,  is  a  consequence  of 
Mason’s  completeness  result  [Mas84]).  After  all,  the  history  available  to  the  execution 
system  (and  the  planner)  must  be  at  least  as  constraining  as  the  implied  history  of 
the  progress  measure.  However,  a  planner  that  uses  full  history  in  synthesizing  a 
guaranteed  strategy  need  not  find  a  strategy  that  is  necessarily  executable  using  only 
a  simple  feedback  system.  This  is  because  the  planner  may  specify  different  actions 
for  two  knowledge  states  that  can  give  rise  to  the  same  sensory  interpretation  set  at 
run-time.  Some  additional  mechanism  would  be  required  to  ensure  that  a  stationary 
strategy  based  purely  on  current  sensory  information  is  derivable  from  the  guaranteed 
strategy  suggested  by  the  planner. 

As  an  example,  suppose  in  the  previous  figure  there  is  a  third  action  A3  whose 
effect  on  state  s3  is  to  move  non-deterministically  to  one  of  the  states  Sj  or  s2.  All 
other  states  are  left  unaffected  by  this  action.  Then  a  possible  backchaining  table  for 
a  guaranteed  strategy  might  be  of  the  following  form.  [Notice  that  not  all  knowledge 
states  are  needed  in  determining  a  guaranteed  plan.  For  instance,  if  we  assume  initial 
sensing,  then  the  knowledge  state  {5i,S2,s3}  is  easily  ruled  out.] 
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Aj  A3 

stop  stop  stop 

{51>  53} 
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M 
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Actions  guaranteed  to  attain  the  goal. 

Now  observe,  that  at  run-time,  if  the  actual  state  of  the  system  is  then  the  sensor 
will  return  the  interpretation  set  {ai,s3}.  The  table  would  say  to  execute  A3  and 
sense,  but,  of  course,  that  is  not  the  right  thing  to  do  in  state  sj.  Similarly  for 
s2.  If  by  chance  the  planner  had  returned  the  same  table,  but  with  Aj  and  A2  in 
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the  appropriate  places  instead  of  .A3,  then  a  consistent  stationary  simple  feedback 
strategy  would  have  been  obtainable.  The  problem  is  that  just  running  the  planner 
does  not  ensure  such  a  policy. 

Progress  as  a  Generalization  of  Guarded  Moves 

This  discussion  indicates  that  progress  measures  form  a  useful  intermediate  planning 
approach,  situated  between  strategies  that  employ  perfect  sensing  and  those  that 
rely  on  full  history.  In  many  cases  the  progress  measure  is  naturally  derived  from 
the  perfect-sensing  strategy,  although  arbitrary  progress  measures  are  imaginable. 
The  progress  measure  approach  is  a  natural  generalization  of  those  strategies  that 
execute  a  single  action  over  and  over  until  sorr-3  sensory  condition  is  met  (see  the 
discussion  on  guarded  moves  on  page  46).  For  instance,  the  underlying  primitive  of  the 
preimage  methodology  is  a  single  command  that  is  executed  until  some  termination 
condition  is  true  (see  chapter  4).  Moving  down  until  one  feels  a  force  of  collision  is  a 
typical  application  of  such  a  primitive  action.  In  the  discrete  context  this  primitive 
corresponds  to  moving  through  a  progression  of  states  under  the  repeated  application 
of  a  single  action,  until  some  goal  is  attained.  The  progress  measure  is  simply  the 
distance  moved,  or  perhaps  the  change  in  some  coordinate.  The  notion  of  progress 
is  of  course  more  general  than  progress  relative  to  a  single  action,  and  much  of  this 
chapter  has  been  concerned  with  generalizing  that  notion.  The  more  general  notion 
involves  categorizing  states  by  how  far  they  are  from  the  goal  in  terms  of  how  many 
actions  may  be  required  maximally  to  attain  the  goal,  as  discussed  in  claim  3.12. 

Guessing,  Whenever  Progress  is  not  Possible 

Unfortunately,  it  may  not  always  be  possible  to  ensure  that  progress  is  made  at  every 
state  for  every  possible  sensory  interpretation  set  that  might  arise  while  the  system 
is  in  that  state.  In  these  cases  it  is  useful  to  randomize  by  guessing  as  before.  In 
other  words,  if  some  sensory  interpretation  set  is  of  the  form  /  =  U!=i  N,,  such  that 
there  are  actions  A,  that  cause  every  state  in  K,  to  make  progress,  then  the  system 
should  randomly  choose  one  of  the  A,  to  execute.  This  guessing  is  similar  to  the 
guessing  employed  in  the  randomization  of  section  3.9.  The  difference  is  that  now  the 
knowledge  state  of  the  system  is  the  most  recent  sensory  interpretation  set,  rather 
than  a  state  derived  from  previous  guesses  and  actions.  One  imagines  that  in  the 
worst  case  each  step  of  the  strategy  requires  an  n-way  guess.  Such  could  be  the 
case  in  a  sensorless  task  (sensorless  except  for  goal  recognizability).  However,  in  that 
setting  one  would  probably  do  well  to  employ  some  form  of  history. 

Sensing  and  the  Speed  of  Progress 

Let  us  discuss  the  role  of  sensors  in  determining  whether  progress  is  possible  at  a  given 
state.  Consider  a  state  s  and  its  collection  of  possible  sensory  interpretation  sets  E(s). 
If  for  all  sensory  interpretation  sets  /  £  E(s),  it  is  possible  to  select  an  action  Ai  that 
ensures  progress  independent  of  the  actual  state  J  El,  then  in  particular  it  is  possible 
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to  ensure  progress  at  s.  Furthermore,  if  one  considers  the  sensor  to  be  adversarial, 
then  one  may  assume  that  the  sensor  always  forces  that  interpretation  set  I  for  which 
the  action  Aj  makes  the  least  amount  of  progress  at  state  s.  Thus  it  makes  sense  to 
define  the  worst-case  velocity  at  s  to  be 


v,  =  max  vAftt, 

which  agrees  with  the  definition  (3.20). 

By  similar  reasoning,  if  the  sensor  is  adversarial,  and  there  is  some  possible 
interpretation  set  I  €  5 (s)  for  which  progress  is  not  ensurable  independent  of  the 
actual  state  giving  rise  to  I,  then  progress  may  not  be  guaranteed  at  s.  Instead 
the  action  to  be  executed  is  chosen  probabilistically  from  some  collection  Aj  = 
{  Ai,  •  •  • ,  Aq)  that  corresponds  to  the  collection  of  knowledge  states  { X,}  that  cover 
I.  In  this  case,  it  makes  sense  to  define  a  worst-case  average  velocity,  namely  as: 


^5 


max 

/€£(*) 


Mil  St,VA 


The  point  is  that  whenever  the  system  is  in  state  s  and  sensory  interpretation 
set  I  occurs,  on  average  the  guessing  strategy  will  make  progress  that  is  at  least 
—(Haza,  va,»)/\Aj\.  Thus  an  adversarial  sensor  can  only  try  to  minimize  this 
quantity  by  selecting  sensory  interpretation  sets  /  that  behave  poorly.  Once,  again, 
if  v,  is  negative  for  all  states  and  bounded  away  from  zero  by  v,  then  the  worst-case 
average  execution  time  will  be  bounded  by  the  maximum  label  divided  by  —  v. 

This  process  generalizes  as  one  changes  adversarial  actions  to  probabilistic  actions, 
and/or  adversarial  sensors  to  probabilistic  sensors,  until  one  eventually  gets  a  process 
resembling  the  Markov  chains  discussed  earlier  in  this  chapter. 


3.12.4  Partial  Adversaries 

In  the  discussion  on  non-deterministic  tasks  thus  far,  it  has  been  assumed  that  an 
adversary  can  always  force  the  worst  possible  motion  or  sensing  information  at  any 
instant  at  any  state.  However,  for  some  physical  tasks  the  non-determinism  specified 
in  the  actions  and  sensing  function  is  due  to  a  paucity  of  knowledge  in  modelling 
the  system,  rather  than  the  existence  of  an  actual  adversary.  In  other  words,  the 
actual  transitions  or  sensor  values  obtained  depend  on  some  set  of  parameters  whose 
exact  values  are  unknown,  and  hence  are  modelled  as  non-deterministic  uncertainty. 
The  actual  system  behaves  in  a  manner  consistent  with  a  particular  instar.tiation  of 
these  parameters.  This  means  that  the  range  of  transitions  possible  in  response  to  an 
action  and/or  the  sensory  interpretation  sets  obtained  from  a  sensor  are  coupled  at 
different  states  of  the  system.  In  short,  if  an  adversary  can  choose  a  bad  transition 
at  some  state,  this  may  reflect  a  particular  instantiation  of  the  unknown  parameters 
that  precludes  an  independent  worst-case  choice  at  some  other  state. 

As  an  example,  consider  the  case  of  a  sensor  with  an  unknown  bias.  Specifically, 
suppose  that  there  is  a  sensor,  that  returns  a  sensed  position  r*  that  lies  within  some 


172 


CHAPTER  3.  RANDOMIZATION  IN  DISCRETE  SPACES 


error  ball  about  some  unknown  bias,  denoted  by  Bt(x  +  b).  The  notation  is  meant 
to  convey  the  idea  that  the  actual  state  is  x ,  and  that  the  error  ball  is  centered  at 
a  point  that  is  offset  from  x  by  some  bias  6,  and  has  radius  e.  This  notation  makes 
a  lot  of  sense  in  a  vector  space  such  as  3?",  in  which  the  error  ball  might  represent 
the  support  of  some  distribution  function  describing  the  possible  sensor  values  (see 
also  the  examples  of  sections  2.2.2  and  2.4).  Conceptually,  we  can  imagine  a  similar 
situation  for  discrete  tasks.  If  the  bias  b  is  known,  then  whenever  one  sees  a  sensor 
value  x*,  the  resulting  sensory  interpretation  set  implies  that  the  actual  state  of  the 
system  must  lie  in  the  set  Bt(x *  —  6).  However,  if  the  value  of  b  is  not  known  exactly, 
but  can  merely  be  bounded  in  magnitude,  say  as  |6|  <  im**,  then  one  can  merely 
assert  that  the  actual  state  of  the  system  lies  in  the  set  J5e+6mM(x*).  One  would 
model  this  non-deterministically  by  saying  that  the  sensing  function  5  can  return  for 
each  state  x  one  of  a  collection  of  error  balls  of  radius  c  +  fWx,  namely  the  collection 
)},  as  x*  varies  over  Bc+bm„(x).  This  suggests  that  if  the  state  of  the 
system  is  x,  then  an  adversary  could  choose  any  sensor  value  x*  that  lies  within  the 
error  ball  5c+fcmmi(x).  Of  course,  that  is  not  true.  An  adversary  can  merely  choose 
any  sensor  value  from  the  range  Bc(x  +  6),  for  some  actual  but  unknown  b. 


Now  consider  a  task  in  which  the  non-determinism  is  so  great  that  there  is  no 
strategy  that  ensures  progress  at  each  state,  relative  to  some  labelling.  However, 
suppose  further  that  there  exists  an  unmodelled  parameter,  such  as  the  bias  b  in  the 
previous  example,  whose  instantiation  would  permit  progress  for  a  large  number  of 
states.  In  other  words,  given  a  particular  instantiation  of  this  parameter  one  can 
devise  a  strategy  for  which  the  mix  of  states  at  which  progress  is  possible  and  states 
at  which  progress  is  not  possible  is  sufficient  to  ensure  goal  attainment  within  some 
time  bound.  If  this  strategy  is  actually  independent  of  the  particular  instantiation  of 
the  unknown  parameter,  then  the  strategy  is  assured  of  goal  attainment  within  the 
desired  time  bound. 


An  example  is  given  again  by  the  sensing  bias  mentioned  above.  If  the  task  is 
to  move  to  some  region  based  on  sensor  values,  then  for  certain  approach  directions 
the  bias  will  aid  in  attaining  the  goal,  while  for  other  approach  directions  the  bias 
will  hinder  attainment  (recall  the  peg-in-hole  example  of  section  1.1).  One  can  take 
advantage  of  the  bias,  without  knowing  its  true  value,  simply  by  executing  a  strategy 
of  the  type  discussed  in  this  section.  Specifically,  whenever  the  system  can  make 
progress  towards  the  goal,  it  does  so,  and  otherwise  it  executes  a  random  motion. 
The  random  motion  ensures  that  if  the  system  is  in  a  region  in  which  the  bias  is 
precluding  sensory  interpretation  sets  that  ensure  progress  towards  the  goal,  then 
eventually  the  system  will  either  attain  the  goal  or  drift  out  of  that  region  and  into 
another  region  within  which  the  bias  facilitates  goal  attainment. 
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3.13  Some  Complexity  Results  for 
Near- Sensor  less  Tasks 

In  this  section  we  consider  a  special  form  of  the  discrete  planning  problem,  namely  one 
in  which  the  sensors  provide  no  information  other  than  to  signal  goal  attainment.  We 
shall  refer  to  such  problems  as  near-sensorless.  Sensorless  tasks  form  an  important 
subclass  of  the  set  of  robot  tasks.  Mason  (see,  for  example,  [Mas85]  and  [Mas86])  has 
studied  these  problems  extensively.  The  motivation  for  studying  sensorless  problems 
stems  from  the  realization  that  almost  all  tasks  involve  some  operations  in  which 
the  mechanics  of  object  interactions  dominates  any  informational  content  provided 
by  the  sensors.  For  instance,  in  grasping  or  pushing  objects,  even  if  sensors  are 
available  to  provide  a  general  sense  of  the  object’s  behavior,  the  behavior  of  the 
object  at  the  instant  of  contact  tends  to  lie  below  the  resolution  of  the  sensors.  Thus 
it  is  important  to  understand  the  behavior  of  objects,  and  the  manner  by  which 
one  can  control  them,  in  the  absence  of  sensory  information.  [Brost86]  and  [Pesh] 
have  further  explored  sensorless  grasping  and  pushing.  [MW],  [EM]  and  [Nat86]  have 
looked  at  other  tasks  that  are  amenable  to  sensorless  solutions,  such  as  the  problem 
of  unambiguously  orienting  an  object  given  complete  uncertainty  as  to  the  object’s 
initial  configuration,  and  [Wang]  has  studied  extensively  the  impact  problem. 

In  terms  of  the  previous  discussion  in  this  chapter,  we  have  seen  that  tasks  in 
which  sensing  is  perfect  can  be  solved  very  quickly.  For  fixed  control  uncertainty, 
one  may  thus  view  sensing  uncertainty  as  the  devil  that  confounds  one’s  guaranteed 
strategies.  Indeed,  randomized  strategies  were  formulated  precisely  as  a  means  for 
pretending  to  reduce  sensing  uncertainty,  by  simply  guessing  the  state  of  the  system, 
that  is,  by  guessing  the  correct  sensor  value.  Thus  it  is  natural  to  look  at  the  extreme 
case  in  which  there  is  no  sensing  whatsoever.  However,  in  order  to  satisfy  the  goal 
recognition  criterion,  we  will  insist  that  the  goal  be  recognizable.  In  short,  there  is 
some  sensing,  but  it  is  limited  to  deciding  whether  or  not  the  goal  has  been  attained. 

In  this  section  we  will  first  briefly  outline  how  the  general  backchaining  planners 
discussed  earlier  specialize  to  the  sensorless  case,  then  indicate  that  sensorless  and 
near-sensorless  tasks  are  essentially  equivalent  from  the  point  of  view  of  generating 
guaranteed  strategies.  The  main  thrust  of  this  section,  however,  is  given  by  three 
examples  that  indicate  the  complexity  of  planning  with  and  without  randomization. 
We  know,  of  course,  from  [PT]  that  planning  solutions  to  discrete  tasks  in  the  absence 
of  sensing  is  NP-complete.  Specifically,  for  probabilistic  problems  in  which  there  are 
costs  associated  with  transitions,  the  problem  of  deciding  whether  or  not  there  is  a 
sensorless  solution  of  a  fixed  number  of  steps  that  incurs  zero  cost  is  NP-complete. 
The  three  examples  in  this  section  elaborate  on  this  type  of  result.  Specifically,  we 
will  look  at  non-deterministic  problems,  and  merely  ask  for  the  existence  of  a  solution, 
not  the  existence  of  an  optimal  solution.  This  is  equivalent  to  assigning  costs  that  are 
either  zero  or  infinite,  depending  on  whether  the  goal  is  attained  or  not.  Furthermore, 
we  are  interested  in  the  comparison  between  guaranteed  solutions  and  randomized 
solutions. 
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All  three  examples  are  abstract  examples  on  graphs.  Whether  these  can  be 
actually  realized  by  physical  devices  is  not  investigated.  However,  at  the  end  of  this 
section  we  indicate  a  physical  device  that  has  some  of  the  same  properties  as  the  first 
example.  The  first  example  demonstrates  a  task  for  which  there  exists  a  guaranteed 
strategy,  but  which  requires  exponential  time  to  plan  and  execute.  In  addition,  there 
exists  a  randomized  strategy  that  only  requires  quadratic  expected  time  to  attain 
the  goal.  This  example  indicates  that  some  problems  can  actually  be  solved  more 
quickly  by  randomized  strategies  than  by  guaranteed  strategies.  The  second  example 
indicates  that  not  all  problems  can  be  solved  quickly  by  randomization.  And  the 
third  example  shows  that  the  particular  planning  approach  used  may  investigate  an 
exponential  number  of  knowledge  states  even  when  the  number  of  plan  steps  is  fixed. 

3.13.1  Planning  and  Execution 

Let  us  briefly  outline  how  a  system  might  plan  solutions  to  tasks  in  the  sensorless 
and  near-sensorless  settings.  Towards  this  goal,  it  is  useful  to  consider  the  effect 
that  actions  and  sensing  operations  have  on  knowledge  states.  Recall  the  notation  of 
section  3.9. 

Suppose  that  a  system  is  initially  in  knowledge  state  K  and  suppose  that  at 
execution  time  a  sequence  of  actions  {Aj,...,A*}  is  executed  yielding  a  sequence 
of  sensory  interpretation  sets  { .  The  final  knowledge  state  resulting  from 
this  particular  execution  trace  is  given  by  A";  Aj;  /j;  •  •  • ;  A*;  Ik.  In  general,  of  course, 
a  plan  might  specify  a  decision  tree,  so  that  the  actions  executed  are  themselves 
functions  of  the  observed  sensory  information.  In  the  sensorless  case,  the  sensing  at 
each  stage  provides  no  additional  information,  so  one  can  write  the  execution  trace 
as  K\  Ai;<$;  •  •  • ;  Ak\S.  In  particular,  it  is  possible  to  decide  before  execution  whether 
or  not  the  resulting  knowledge  state  is  inside  the  goal  set  C . 

This  simplifies  planning  greatly.  It  means  that  all  actions  may  be  viewed  as 
deterministic  transitions  in  the  space  of  knowledge  states.  Recall  that  in  converting  an 
action  on  the  underlying  state  space  into  an  action  in  the  knowledge  space,  it  was  only 
the  intersection  with  possible  resulting  sensory  interpretation  sets  that  introduced  any 
non-determinism.  (See  page  143.)  Backchaining  using  dynamic  programming  thus 
entails  determining  whether  there  is  a  path  from  K  to  Q  in  the  directed  graph  whose 
states  are  knowledge  states  and  whose  arcs  are  the  possible  deterministic  transitions 
specified  by  the  actions.  [This  was  essentially  the  approach  taken  by  [EM]  and  [MW] 
in  planning  sensorless  orienting  strategies.]  In  short,  a  guaranteed  strategy  consists 
of  a  linear  sequence  of  actions,  not  a  general  decision  tree. 

In  the  near-sensorless  case  each  of  the  sensory  interpretation  sets  /,  is  either  the 
whole  non-goal  space  S  —  S  —  Q  or  the  goal  set  Q.  If  we  assume  that  an  execution 
trace  stops  once  the  goal  is  attained,  then  each  successful  execution  trace  is  of  the 
form  K;  A\-,S\  A2',S\  -  •  • ;  Ak-\',S-,  Ak;6  C  Q.  Clearly,  in  this  case  the  actions  are 
indeed  functions  of  the  sensory  information.  In  particular,  the  number  of  actions 
executed  depends  on  when  the  goal  is  entered,  an  event  that  is  only  determined  in  a 
non-deterministic  fashion  at  execution  time.  However,  as  in  the  sensorless  case,  for  a 
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Figure  3.19:  Decision  trees  for  different  types  of  strategies 
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guaranteed  strategy  there  is  a  definite  sequence  of  actions  that  will  be  executed  if  the 
system  does  not  enter  the  goal.  Said  differently,  the  decision  tree  is  not  really  a  general 
tree,  but  rather  a  linear  sequence  with  one  step  branches  at  each  step  corresponding 
to  early  goal  attainment.  See  figure  3.19. 

In  the  space  of  knowledge  states  all  actions  are  thus  either  deterministic  or  non- 
deterministic  with  two  possible  target  states.  In  particular,  suppose  K  is  a  knowledge 
state  and  A  an  action.  If  the  forward  projection  Fa{K )  contains  no  goal  states  then 
one  has  a  deterministic  transition  A  :  K  *->  FA(K).  Otherwise,  one  either  has  a 
non-deterministic  transition  A  :  K  •— ►  FA(K)  —  G,G\  or  complete  goal  attainment 
A  :  A'  »-»  Q.  Again,  planning  by  backchaining  corresponds  to  determining  a  path 
from  K  to  Q  in  a  directed  graph.  As  before,  the  states  of  the  graph  are  the  knowledge 
states.  The  arcs  are  simply  the  non-sensing  transitions  specified  by  the  actions.  This 
means  that  there  is  an  arc  labelled  with  A  directed  from  K  to  FA(K )  —  Q  whenever 
FA(K)  contains  at  least  one  non-goal  state,  and  otherwise  there  is  an  arc  directed 
from  K  to  Q  labelled  with  .4.  A  sequence  of  such  directed  arcs  leading  from  A'  to  Q 
represents  the  longest  possible  execution  trace  of  the  guaranteed  strategy  for  attaining 
the  goal. 

One  sees  then  that  planning  in  the  sensorless  and  near-sensorless  settings  are 
very  similar.  In  the  sensorless  case  one  seeks  a  sequence  of  actions  {Aj,  A2, . . . ,  A*} 
such  that  A';  A\ ;  5;  ,42;  S\  ■  ■  ■ ;  .4*;  S  C  Q,  while  in  the  near-sensorless  case  one  seeks 
a  sequence  of  actions  such  that  K\  A\\  <S;  ,42;  <?;•••;  Ak\ S  C.Q.  Here  S  —  S  —  Q  is  the 
set  of  non-goal  states.  In  the  sensorless  case  the  entire  sequence  of  actions  is  always 
executed,  while  in  the  near-sensorless  case  the  entire  sequence  is  only  executed  in  the 
worst  case. 

3.13.2  Partial  Equivalence  of  Sensorless  and 
Near-Sensorless  Tasks 

In  the  previous  discussion,  we  saw  a  strong  similarity  between  sensorless  and  near¬ 
sensorless  tasks  in  terms  of  the  structure  of  guaranteed  solutions.  The  following 
paragraphs  will  make  this  similarity  more  precise. 

Consider  a  discrete  planning  problem  (S,.4,E,  G)  in  which  the  sensing  function 
returns  no  information.  In  other  words,  E(s)  =  {5}  for  every  state  s.  Now  consider 
a  modified  problem  (S'..  "  for  which  the  set  of  states  is  augmented  by  one 

new  state  sG,  which  now  becomes  the  goal  state.  In  other  words  S'  —  «SU{sg}  and 
G'  =  {sG}.  Furthermore,  define  A'  essentially  to  be  just  A  with  one  additional  action 
AG,  whose  effect  we  will  describe  shortly.  In  particular,  for  any  action  A  €  A  let  A 
have  precisely  the  same  effect  on  states  in  5  as  before,  and  let  its  effect  on  the  new 
state  sG  be  a  self-transition.  In  other  words,  A  :  sG  *— ►  sG ■  The  additional  action 
Ag  is  designed  to  move  any  goal  state  in  the  old  system  into  sG,  and  otherwise  non- 
deterministicallv  move  to  any  one  of  the  states  in  S.  In  other  words,  if  the  states  of 
the  original  system  are  given  by  S  =  {s,,  ■  ■■ ,  ,sn},  with  goal  states  G  =  |sj,  •  •  ■  ,sr}, 
then  Ar,  is  specified  by: 
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Ag  :  Si  i ►  sq 


St  >-*  SG 
Sr+ 1  Si ,  .  .  .  ,  Sn 


S n  1  f  •  •  •  ,  Sn 
SG  >“►  SG 

Finally,  define  a  new  sensing  function  E'  that  gives  partial  sensing.  In  particular, 
E  permits  goal  recognizability  in  the  new  system.  This  is  modelled  as  E'(s)  =  {5} 
for  every  s  £  S,  and  E'(sg)  =  {{sg}}- 

Let  us  compare  solutions  to  the  two  problems.  Suppose  that  the  unmodified 
system  starts  in  knowledge  state  K  and  that  there  is  a  sequence  of  actions  A\,...,Ak 
and  a  sequence  of  (no-op)  sensing  operations  such  that  at  execution  time  the 
final  knowledge  state  K\  A\,  Ix;  ■  ■  • :  Ak\  h  lies  inside  the  goal  Q.  Then  clearly,  for 
the  modified  system,  the  execution  trace  K;  A\‘,  I{;  ■  •  ■ ;  Ak\ /£;  Ag\  I'c  must  be  the 
singleton  set  {sg}.  Here  each  of  the  /'  are  the  sensory  operations  returned  by 
the  modified  sensing  function  E'.  Clearly  /'  =  /,  =  S  for  each  i  =  1  and 

I'g  —  {sg}-  Conve-sely,  suppose  that  in  the  modified  system  there  is  a  sequence  of 
actions  and  sensing  operations  starting  from  some  knowledge  state  K  C  S,  such  that 
the  final  knowledge  state  is  guaranteed  to  be  the  goal  state  {sg}.  Then,  eliminating 
superfluous  actions,  clearly  the  last  action  must  be  Ag,  and  the  execution  trace  up 
until  this  last  action  must  be  guaranteed  to  place  the  system  into  the  original  goal 
set  Q ,  using  only  action  in  A. 

In  short,  if  there  is  a  strategy  for  knowingly  achieving  the  goal  in  the  sensorless 
system,  then  there  is  a  strategy  for  knowingly  achieving  the  goal  in  the  near-sensorless 
system,  and  conversely.  Thus  the  existence  and  structure  of  a  guaranteed  strategy 
for  accomplishing  a  sensorless  task  is  not  fundamentally  affected  by  the  addition  of 
a  goal-sensor;  one  can  always  modify  the  problem  slightly  so  that  the  goal-sensor 
does  not  provide  any  information  useful  to  the  guaranteed  strategy.  However,  it  is 
clearly  true  that  in  general,  that  is,  for  unmodified  tasks,  the  goal-sensor  does  provide 
some  additional  information.  In  particular,  if  a  motion  happens  to  stray  into  the  goal 
region,  a  goal-sensor  will  detect  this.  In  contrast,  a  sensorless  system  would  not 
necessarily  be  able  to  guarantee  goal  recognition.  This  property  will  be  useful  in  the 
context  of  random  strategies,  as  we  shall  see  presently. 

One  can  also  establish  a  correspondence  in  the  other  direction,  that  is,  one  can 
convert  any  near-sensorless  problem  into  a  sensorless  one  with  minor  modifications, 
while  preserving  the  existence  and  essentially  the  structure  of  guaranteed  strategies. 
The  basic  idea  is  to  replace  the  goal-sensor  with  a  mechanical  trap  that  precludes 
ever  leaving  the  goal  once  it  has  been  attained.  So,  suppose  we  are  given  a  discrete 
planning  problem  (5,.4.E,£)  in  which  the  state  space  is  5  =  {si,---,sn}  and  the 
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goal  states  are  Q  =  {si,  -  •  • ,  sr}.  The  sensor  can  recognize  goal  attainment,  but 
otherwise  provides  no  information.  Thus  E(s)  =  {«S  —  G)  for  all  non-goal  states  s 
and  E(g)  =  {&}  for  all  goal  states  g.  Now,  consider  a  modified  problem  («S,  A\  E',£), 
which  has  the  same  state  space  and  goal  set  as  the  previous  problem,  but  modified 
actions  and  a  modified  sensing  function.  The  new  sensing  function  E'  provides  no 
information  whatsoever,  that  is,  E’(s)  =  {5}  for  all  states.  The  new  actions  are 
identical  to  the  old,  except  that  transitions  out  of  goal  states  have  been  changed  to 
self-transitions. 

Consider  again  an  execution  trace  in  the  original  system  from  some  initial 
knowledge  state  K  into  the  goal  set  G,  that  is  A';  Ai;  I\\ •  •  • ;  Ak]  h  Q  0-  Assuming 
a  worst-case  scenario,  in  which  an  adversary  always  forces  non-goal  transitions, 
the  discussion  from  section  3.13.1  allows  us  to  assume  that  the  sequence  of 
actions  is  a  guaranteed  plan  for  attaining  the  goal  from  K.  In  other  words, 
K:  A\  \S\  A2;«S;  •  •  • ;  Ak,S  C  Q ,  where  S  =  S  —  Q .  Since  the  modified  actions  A\  leave 
goal  states  invariant,  we  have  also  that  A';  ,4',;  5;  A2',  S;  ■  •  • ;  A'k',  S  C  G-  In  short,  the 
modified  sequence  of  actions  is  a  guaranteed  plan  in  the  modified  sensorless  problem. 
Conversely,  it  is  clear  that  any  sequence  of  actions  guaranteed  to  attain  the  goal  in 
the  modified  sensorless  problem  is  also  a  sequence  of  actions  guaranteed  to  attain  the 
goal  in  the  original  near-sensorless  problem.  This  is  because  the  effect  of  an  action 
on  a  goal  state  is  irrelevant  if  the  goal  is  recognizable. 

In  terms  of  finding  guaranteed  strategies,  we  see  that  sensorless  and  near-sensorless 
problems  are  very  similar.  Adding  a  goal  sensor  to  a  sensorless  problem  does  not 
change  the  structure  of  the  problem  much,  if  the  applicability  of  the  sensor  depends 
on  first  executing  a  proper  action.  Conversely,  for  a  near-sensorless  problem,  removing 
the  sensor  does  not  change  the  problem  substantially,  if  the  sensor  can  be  replaced 
by  a  physical  trap. 

3.13.3  Probabilistic  Speedup  Example 

Let  us  turn  to  the  first  example.  See  section  3.13.6  below,  for  a  physical  device  that 
has  important  commonalities  with  the  following  example. 

We  will  construct  a  non-deterministic  discrete  planning  problem,  consisting  of  n 
states  and  n  actions.  There  will  be  one  goal  state,  and  no  sensing.  We  will  exhibit  a 
guaranteed  solution  for  attaining  the  goal  from  an  initial  knowledge  state  of  complete 
uncertainty.  The  solution  requires  2n—  n  —  1  steps,  and  is  the  shortest  possible  solution 
guaranteed  to  attain  the  goal.  However,  if  the  starting  state  is  known  exactly,  there 
will  be  solutions  of  linear  length.  This  suggests  a  guessing  strategy  that  guesses 
the  initial  state,  thus  attaining  the  goal  in  quadratic  expected  time.  Of  course,  one 
must  add  a  goal-sensor  to  recognize  goal  attainment.  However,  doing  so  does  not 
change  the  fundamental  character  of  the  problem,  as  one  could  always  perform  the 
modifications  suggested  in  section  3.13.2.  This  example  demonstrates  that  there  are 
tasks  for  which  randomization  can  speed  up  execution  time.  Furthermore,  by  the 
discussion  in  section  3.9,  it  is  easy  to  decide  whether  there  exists  a  fast  randomized 
solution  that  randomizes  by  guessing  the  initial  state  of  the  system. 
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Let  the  states  be  S  =  {sj,-  ■  •  ,s„},  with  the  goad  being  state  Sj.  For  convenience 
we  will  sometimes  refer  to  states  by  their  indices,  and  specify  knowledge  states  as 
subsets  of  the  integers.  Thus  K  =  {1,2,7}  means  that  the  system  is  in  one  of  the 
states  Sj,  s2,  or  s7,  that  is,  K  =  {.Si,  s7}  in  the  usual  notation. 

The  actions  will  have  the  following  effect.  Essentially,  we  want  to  force  the  system 
to  traverse  almost  all  knowledge  states,  beginning  from  {1,2,-  •  •  ,n},  before  arriving 
at  the  goal  { 1 } .  Specihcaiiy,  the  system  will  be  forced  to  first  traverse  all  knowledge 
states  of  size  n  —  1,  then  all  knowledge  states  of  size  n  —  2,  and  so  forth,  through 
all  knowledge  states  of  size  2,  until  finally  arriving  at  the  goal  {1}.  Furthermore, 
within  a  collection  of  knowledge  states  of  a  given  size,  the  system  will  be  forced 
to  traverse  the  knowledge  states  in  lexicographic  order.  The  lexicographic  order 
of  a  knowledge  state  K  =  {s*,,^,,  ••  •  ,s,t}  (also  written  as  K  =  {il5  t2,  ■  •  • ,  t*}) 
containing  k  elements  is  determined  by  the  string  s,,s,2  •  •  ■  s,t  of  length  k ,  where  the 
{stj}  are  assumed  to  be  ordered  in  such  a  way  that  *i  <  *2  <  •  ‘  •  <  i*.  As  an 
example,  the  knowledge  state  {2,1,7,12}  precedes  the  knowledge  state  {3, 6, 1,7} 
since  SiS2SjS\2  <  st s3s6s7  lexicographically.  Observe  that  the  first  state  of  length  k 
in  this  ordering  is  the  knowledge  state  K^D  =  {1,2,*--,  fc},  whereas  the  last  state  is 
=  {n  —  k  +  l.n  —  A:  -f  2,  ■  •  •  ,n}.  We  will  refer  to  the  collection  of  knowledge 
states  of  size  k  as  the  kiK  level. 

For  the  sake  of  example,  consider  the  case  n  =  4.  The  relevant  knowledge  states 
and  the  order  in  which  the  system  will  be  forced  to  traverse  them  is  given  by  the 
following  sequence,  arranged  by  level.  Within  each  level  the  knowledge  states  are 
listed  in  lexicographic  order  from  left  to  righ* 


Level  4: 

{1,2,  3.4} 

Level  3: 

{1,2.3}- 

— *{1,2,4} - >{1,3,4}- 

—>{2.3,4} 

Level  2: 

{1,2} - -{1,3} 

- >{1,4} - >{2,3}  — 

-{2,4} - >{3,4} 

Level  1: 

{1} 

The  first  action  Ao  that  we  will  specify  is  designed  to  permit  motions  between 
levels,  specifically  from  the  last  state  in  each  level  to  the  first  state  in  the  next  lower 
level,  that  is,  from  A'^  to  for  all  k  =  n,...,2.  In  addition,  Ao  should  not 

be  useful  for  any  other  motions,  that  is,  the  action  should  not  be  capable  of  moving 
the  system  ahead  more  than  one  knowledge  state  in  the  order  that  we  just  specified. 
This  means  that  the  only  other  motions  possible  should  move  either  to  a  higher  level 
or  to  a  previous  state  in  the  same  level.  The  action  is  given  as: 
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1 

h- ► 

1,2,.. 

. ,  n  —  2, 

2 

1 — ► 

1,2,.. 

.  ,n  —  2 

k 

1 — ► 

1.2,.. 

.  ,n  —  k 

n  —  2 

1 — ► 

1,2 

n  —  1 

1 

n 

1 

Since  there  is  no  sensing,  we  will  write  A(K)  to  mean  FA{K)  for  any  action  A 
and  any  knowledge  state  K.  Observe  then  that  indeed  =  K%j£.  Now 

consider  an  arbitrary  knowledge  state  with  k  elements,  say  K  —  {11,12,  •  •  • ,  i*},  with 
*’1  <  *2  <  •  •  •  <  **•  Then  Ao(K )  =  A0( { 1 1 } )  =  {1,2,  •  •  •  ,n  —  A}.  If  we  suppose  that 
K  is  not  K^x,  then  it  must  be  the  case  that  <  n  —  k  +  1.  This  in  turn  implies 
that  A0(K)  D  A0({n  —  A-})  =  {1,2,  •••,&}  =  A'^n.  In  other  words,  either  Aq{K) 
contains  k  elements  and  is  equal  to  the  least  such  set.  or  Aq(I\)  contains  more  than 
k  elements.  In  either  event  Ao(K)  appears  before  K  in  the  sequence  of  knowledge 
states  that  we  are  forcing  the  system  to  traverse.  Thus  A0  cannot  be  used  to  any 
advantage  in  jumping  ahead  in  that  sequence. 

For  the  case  n  =  4,  Aq  is  given  by: 

Aq  :  1  1 — ►  1,2,3 

2  *-+  1,2 

3  1 — ►  1 

4  1— ►  1, 

which  maps  between  levels  as  follows: 

Level  4:  {1,2,  3,  4} 

Ao  | 

Level  3:  {1,2,3}  ^ 

Level  2: 

Level  1: 

Here  <-V>  refers  to  any  action  other  than  Aq. 

Now  we  must  define  the  remaining  n  -  1  actions.  The  purpose  of  each  of  these 


{2,3,4} 

A0  } 

{1,2}  ^  ...  ^  {3,4} 

A0  | 
{1} 
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will  be  to  permit  the  system  to  advance  between  consecutive  knowledge  states  in 
the  lexicographic  ordering,  while  preventing  the  system  from  using  the  actions  to 
advance  more  than  one  step  in  the  ordering.  In  order  to  understand  the  definition 
of  these  actions,  we  will  look  at  how  to  form  the  successor  of  a  given  knowledge 
state  within  a  specific  level,  relative  to  the  lexicographic  ordering.  Again,  let  us 
introduce  some  temporary  notation.  First,  for  the  time  being,  whenever  we  write  a 
knowledge  state  as  a  set,  we  will  write  its  elements  in  order,  so  that  the  representation 
of  the  state  corresponds  to  its  lexicographic  label.  In  other  words  a  knowledge  state 
K  =  {s„,st2,  •  •  • ,  s^}  will  be  depicted  in  the  form  K  =  {4, 4>  •  •  • , 4},  with  4  <  4  < 
•  ■  •  <  4-  Second,  if  we  are  only  interested  in  the  last  £  elements  of  the  knowledge 
state  relative  to  this  ordering,  then  we  will  write  it  as  {<*»,  4-r+i,  ik-t+2,  •  •  • ,  4}-  In 
other  words,  the  prefix  “<*«”  will  mean  zero  or  more  elements  whose  lexicographic 
value  is  less  than  that  of  the  elements  that  follow.  If  this  symbol  appears  more  than 
once  in  an  equation,  then  it  is  assumed  to  be  bound  to  the  same  value  throughout 
the  equation.  And  third,  we  will  let  SUCC  denote  the  successor  function  relative  to 
the  lexicographic  ordering  and  the  level  in  which  a  knowledge  state  is  located. 


Now  consider  the  successor  to  a  knowledge  state  K.  I<  is  necessarily  of  the 
form  for  some  ik.  If  ik  ^  n  then  S UCC(A')  =  {<*w,4  +  1}.  On  the  other 

hand,  if  ik  =  n,  then  we  must  consider  the  next  to  last  entry,  that  is.  we  must 
look  at  4~i  in  the  representation  K  =  {<*n,4-i,  n).  Again,  if  4-i  ±  n  —  1  then 
SUCC(A')  =  {<*m, 4-i  +  l,4-i  +  2}.  Notice  that  in  this  case  the  successor  function 
changes  not  only  the  next  to  last  entry,  but  may  also  change  the  last  entry.  In 
particular,  the  last  entry  is  set  to  be  exactly  one  more  than  the  next  to  last  entry. 
This  follows  from  the  definition  of  a  lexicographic  order  (without  duplicates).  Once 
again,  if  4- 1  =  n  —  1,  then  we  must  look  at  the  second  to  last  entry  4-2,  and  so  forth. 
In  general,  if  we  are  required  to  look  at  the  last  £  entries,  then  K  must  be  of  '-he  form 
{<*w.  t,  n  —  £  +  2,  n  —  £  +  3,  •  •  ■  ,n},  for  some  i  with  1  <  i  <  n  —  £.  Thus  Succ(A') 
is  of  the  form  {<w, i  +  1,  i  +  2,i  +  3,  •  •  • ,  t  +  £}.  The  only  exception  to  these  rules  is 
if  K  —  K for  some  k.  However,  in  that  case,  we  are  not  interested  in  Succ(A') 
anyway,  as  action  Ao  applies. 


We  will  now  define  actions  Ai,---,An_i,  where  the  purpose  of  action  A,  is  to 
change  A'  to  Succ(A')  for  all  knowledge  states  of  the  form  A"  =  {<*k,  t,  n  —  £  +  2,  n  — 
£  +  3,-  •  •  ,n},  for  some  £.  In  other  words,  if  the  relevant  entry  in  determining  the 
successor  of  K  has  value  i,  then  A,  will  be  the  action  that  permits  the  system  to 
make  progress  towards  the  goal.  Furthermore,  none  of  the  other  actions  will  permit 
progress  at  K . 


From  the  previous  discussion  one  sees  that  A,  must  be  of  the  form: 
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2  i — *•  2 

i  —  1  i  --  1 

i  »-*•  t  +  1 

i  +  1  1,2,  ...,n 

i  +  2  i— ►  t  +  2,  i  +  3, . . . ,  n 


i  +  j  i — ►  i  -f-  2,  i  +  3, . . . ,  n  +  2  —  j 

n  —  1  >— >  i  +  2,  i  +  3 
n  »— >  i  +  2 

Notice  that  A,  leaves  all  states  in  the  range  [1,2  —  1]  unchanged.  This  corresponds  to 
the  u<wj”  entries  in  the  representation  K  =  {<*a,i,n  —  £  +  2,  n  —  I  +  3,  •  •  • ,  n}.  Also, 
A,  advances  i  to  i  +  1,  which  is  the  first  entry  changed  by  the  successor  function. 
State  i  +  1  is  non-deterministicallv  sent  to  all  possible  states.  This  is  done  to  preclude 
use  of  Ai  when  the  relevant  entry  determining  the  successor  of  K  actually  has  value 
z  +  1 .  The  remaining  states  i  +  2,  ...,n  are  each  sent  non-deterministically  to  a 
subset  of  themselves.  These  sets  form  a  tower  collapsing  to  2  +  2,  that  ensures  proper 
computation  of  the  successor  function. 

We  will  now  prove  that  these  actions  do  indeed  define  a  task  for  which  there  exists 
a  guaranteed  solution  whose  length  necessarily  is  of  exponential  size.  Then  we  will 
instantiate  the  actions  and  the  strategy  for  the  case  n  =  4. 


Claim  3.17  For  the  actions  and  task  defined  above,  there  exists  a  guaranteed 
strategy  that  traverses  essentially  all  knowledge  states,  in  the  order  described  above. 
Furthermore,  there  is  no  shorter  guaranteed  solution. 

Proof.  First,  let  us  show  that  for  every  knowledge  state  containing  two  or  more 
states,  there  is  some  action  that  makes  progress  towards  the  goal.  Once  we  establish 
this,  the  existence  of  a  guaranteed  solution  of  the  type  described  is  established.  Recall 
that  progress  means  either  moving  to  a  successor  state,  or  moving  down  to  the  next 
level,  where  each  level  consists  of  knowledge  states  of  a  given  size. 

Let  K  =  {ij,  •••,»’*},  with  t'i  <  •••  <  i*,  be  given.  As  we  already  indicated, 
if  K  =  K* ^  =  {n  —  k  +  l,n  —  k  +  2,  -  -  - ,  n} ,  then  A0  will  make  progress  at 
K.  Otherwise,  determine  the  smallest  index  I  for  which  K  is  of  the  form  K  = 
{ii, ■  •  • , ik-t , 2. n  —  £  +  2, n  —  £  +  3, •  •  •  ,n},  with  ik-t+i  =  i  and  1  <  i  <  n  —  £.  Use 
I  —  1  if  ik  <  n.  Then  action  A,  will  make  progress  at  K ,  by  construction.  This 
follows  from  the  following  calculation  (which  makes  use  of  the  fact  that  -Aj({tj  })  =  ij 
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for  1  <  ij  <  i,  and  the  fact  that  A,({n  —  I  +  j})  =  {i  +  2,  i  +  3,  •  •  •  ,  i  +  l  —  j  +  2}  for 
2  <  j  <  £). 


MK)  =  IJ  Miij}) 

3= 1 

tU<fc)))  u*(W)  U  m  *«»-<+;)) 

=  {*i,  •••,**-/}  (J  {t  +  2,1  +  3, •  •  • , i  +  £} 

=  Succ(A'). 


Now  let  us  proceed  in  the  other  direction,  and  show  that  applying  the  wrong 
action  A,  to  a  knowledge  state  cannot  cause  the  system  to  advance  in  the  ordering 
outlined  earlier.  This  will  establish  uniqueness  of  the  solution,  in  the  sense  that  there 
is  no  shorter  guaranteed  strategy. 

Let  a  knowledge  state  K  be  given,  and  consider  applying  action  ,4,.  We  have 
already  shown  that  Ao  cannot  make  progress  unless  K  =  A'^  for  some  k,  so  assume 
that  i  >  0.  Observe  that  if  i  +  1  €  K ,  then  A,(K)  =  {1,2,  ■  ■•,«},  that  is,  A, 
maps  K  to  complete  uncertainty.  This  is  definitely  not  progress,  so  we  may  as  well 
assume  that  i  +  1  £  K.  Now  suppose  that  in  fact  K  C  {1,2,---,*  —  1}.  Then 
A,(K)  —  K.  which  again  means  there  is  no  progress.  Similarly,  if  K  C  {1,2,-  •  ■ ,  * } , 
then  A,(K)  C  {1,2,-  ■  ■  ,i  —  1 , *  +  1},  which  is  progress,  but  now  K  is  of  the  form 
for  which  A,  was  designed  in  the  first  place.  So,  we  may  assume  that  K  intersects 
the  set  {i  +  2,  ■  •  •  ,n}.  Let  (  be  the  minimal  element  in  K  n{*  +  2,  ■  •  •  ,n}.  Then 
Ai(K)  D  A,({£}).  Now  write  K  as 

k =(A'nii.- •■,<-!)) u  (*n<'+2.  •.-})  u («rit*'>) - 

Given  the  minimality  of  £,  this  says  that  \K\  =  | K  H{1,  •  •  • ,  i  —  1}  |  +  \K  f){£,  •••,«}!  + 
X/c( where  xk  is  characteristic  function  of  K.  Applying  action  A,,  we  see  that 

MK)=  1})  U  {*  +  2, *  +  3, •  •  • , i  +  n  —  £  +  2}  (J 

where  A,({i})  =  {i  +  1}.  Thus  \A,(K)\  =  \K  ■  ,i  -  1}|  +  (n  -  £  +  1)  +  X*r(*)- 
If  |A,(/f)|  >  \K\,  then  A ,  is  moving  K  back  up  one  or  more  levels,  hence  not  making 
progress,  so  consider  the  possibility  that  |A,(A)|  <  \K\.  This  is  possible  if  and  only 
if  n  —  £  +  1  <  \K  p{£,  •  •  • ,  n}|.  Clearly,  this  inequality  can  at  best  be  an  equality,  in 
which  case  we  must  have  that  K  =  {£,•••,  n}.  Now  there  are  two  possibilities:  either 
i  €  K  or  not.  In  the  first  case,  we  have  that  K  is  of  the  form  {<3®,  t,£,  £+  1,  •  • ,  n},  in 
which  case  A,  is  designed  to  make  progress  at  K.  Thus,  finally,  assume  that  i  #  K. 
So,  K  =  {<w,£,£+  1, •  •  • , n}  and  A, (A")  =  {<wj, i  +  2, •  ■  ■  ,i  +  n  —  £  +  2},  with  i  +  2  <  £. 
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But  this  says  that  either  Ai(K)  is  equal  to  K  or  At(/f)  precedes  K  lexicographically. 
In  short,  A,  does  not  make  progress  at  K.  | 

Let  us  instantiate  these  actions  for  the  case  n  =  4.  We  have 


1 

2 

A2  :  1  •- ► 

1 

A3  :  1  t— ► 

1 

2  •-» 

1,2,3, 4 

2  t-* 

3 

2 

2 

3  *-* 

3,4 

3  H-+ 

1,2, 3, 4 

3  b-* 

4 

4  t-» 

3, 

4  H-+ 

4, 

4  •-+ 

1,2, 3, 4 

The  guaranteed  solution  is  given  by: 


We  see  then  that  there  are  tasks  for  which  the  planning  and  execution  times  are 
exponential  in  the  size  of  the  input.  Observe,  however,  for  this  particular  example, 
that  if  the  initial  state  of  the  system  were  known  precisely,  then  there  would  be  a 
fast  solution  for  attaining  the  goal.  In  particular,  if  the  initial  state  is  Si  then  the 
system  is  already  in  the  goal.  If  the  initial  state  is  either  s3  or  s4,  then  action  Ao 
will  attain  the  goal  in  a  single  motion.  Finally,  if  the  initial  state  is  s2,  then  action 
A 2  will  cause  a  transition  to  state  s3,  from  which  Ao  will  attain  the  goal.  In  short, 
if  one  writes  out  the  dynamic  programming  table  to  two  columns  for  this  task,  then 
one  has  a  collection  {/^}  of  knowledge  states  that  cover  the  entire  state  space.  Thus 
one  can  employ  a  randomized  strategy  that  guesses  the  initial  state  of  the  system, 
then  executes  a  short  sequence  of  actions  designed  to  attain  the  goal.  One  must,  of 
course,  add  a  goal  sensor,  in  order  to  ensure  reliable  goal  recognition. 

For  the  sake  of  completeness,  note  that  the  relevant  portion  of  the  backchaining 
diagram  corresponding  to  the  dynamic  programming  table  out  to  two  columns  is  given 
by  the  following  diagram  (depicting  vertical  levels  rather  than  horizontal  columns): 
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{2} 

A2 


In  the  general  case,  one  must  backchain  out  to  the  n  —  2nd  column  of  the  dynamic 
programming  table.  A  guessing  strategy  consists  of  guessing  between  the  n  —  1  non¬ 
goal  states,  then  executing  a  strategy  of  no  more  than  n— 2  steps,  that  is  guaranteed  to 
attain  the  goal  if  the  guess  is  correct.  Thus  the  expected  number  of  actions  executed 
until  the  goal  is  attained  is  on  the  order  of  n2. 

Notice  that  adding  a  goal  sensor  does  not  fundamentally  change  the  exponential 
character  of  the  guaranteed  strategy,  by  the  partial  equivalence  of  sensorless  and  near- 
sensorless  tasks,  as  established  in  section  3.13.2.  It  is  important  to  keep  this  partial 
equivalence  in  mind,  since  a  goal  sensor  clearly  permits  a  speedup  of  the  guaranteed 
solution  if  one  does  not  make  the  modifications  suggested  by  the  partial  equivalence. 
We  thus  have  the  following  claim. 

Claim  3.18  There  exists  a  near-sensorless  discrete  planning  problem  in  which  the 
shortest  guaranteed  strategy  has  exponential  length,  but  for  which  there  exists  a 
randomized  strategy  that  only  requires  quadratic  expected  time. 

Proof.  Most  of  this  claim  has  been  proved.  We  only  need  to  verify  that  there 
does  indeed  exist  a  linear  time  strategy  for  attaining  the  goal  if  the  initial  state  of 
the  system  is  known.  We  return  to  the  construction  above. 

First  notice  that  action  Aq  is  guaranteed  to  move  state  sn  and  s„_i  into  the  goal 
in  a  single  motion.  Observe  also  that  action  At  is  guaranteed  to  move  state  s,  to  state 
st+1  for  all  i.  This  establishes  the  claim.  | 

In  retrospect,  the  randomizing  part  of  the  claim  is  not  very  surprising.  The 
actions  A,  are  actually  fairly  deterministic.  However,  the  solutions  are  not  at  all 
commensurate.  Said  differently,  the  solution  for  a  given  initial  state  is  not  guaranteed 
tc  serendipitously  make  progress  at  other  states.  This  is  quite  unlike  the  fortunate 
situation  that  we  encountered  with  one-dimensional  random  walks,  where  the  same 
solution  pretty  much  applied  to  all  possible  states.  Thus  the  surprising  aspect  of  the 
claim  is  the  exponential  character  of  the  guaranteed  solution  for  what  may  seem  to 
be  fairly  deterministic  actions. 


3.13.4  An  Exponential-Time  Randomizing  Example 

The  following  example  exhibits  a  (near-)sensorless  task  for  which  the  shortest 
guaranteed  solution  requires  an  exponential  number  of  steps  and  for  which  a 
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randomized  solution  that  guesses  the  starting  state  also  requires  exponential  time. 
The  basic  idea  is  to  generate  a  problem  in  which  the  knowledge  states  play  the  role 
of  bit  vectors,  that  may  be  modified  only  by  counting. 

The  example  will  consist  of  n  states,  and  2n  —  3  actions.  We  will  present 
the  example  as  if  there  is  no  sensing,  bearing  in  mind  the  partial  equivalence 
between  sensorless  and  near-sensorless  problems  of  section  3.13.2.  We  retain  some 
of  the  notation  from  the  previous  example  (section  3.13.3).  In  particular,  we  will 
interchangeably  refer  to  a  state  either  as  s<  or  as  i,  tor  t  =  1, . . . ,  n. 

The  state  space  will  be  of  the  form  S  =  {«!,•••,$„}  =  {1,  •••,«},  with  the 
goal  being  state  sj.  We  will  denote  the  actions  by  the  symbols  A\,  •  ■  ■ ,  An_i  and 
B\,  •  ■  ■ ,  We  will  write  knowledge  states  as  ordered  tuples,  as  we  did  in  the 

previous  section.  In  other  words,  a  knowledge  state  K  of  size  k  will  be  written  in  the 
form  K  =  {s„,  •  •  •  ,s,t}  =  {ii,  •  •  • ,  i/t},  with  z'j  <  •••  <  i*.  Thinking  of  a  knowledge 
state  as  a  bit  vector.  K  will  correspond  to  the  number  x(K),  with 

x(K)  =  £  2—. 

>€K 

Conversely,  given  an  integer  x  in  the  range  [0,2n  —  1],  there  is  a  unique  knowledge 
state  K  for  which  x(A')  =  x.  We  will  denote  this  knowledge  state  by  A'(x),  with 

K(x)  =  {z  |  bit  #  (n  —  i)  is  a  1  in  the  binary  representation  of  x}. 

As  an  example,  if  n  =  10  and  I\  =  {1,3,7},  then  x(K)  =  648.  Similarly,  if  n  =  4 
and  x  =  9,  then  A'(x)  =  {1.4}. 

As  before,  we  will  let  the  prefix  symbol  in  the  representation  I\  = 

{<3W.  t| ,  •  •  • ,  i()  denote  zero  or  more  elements  whose  lexicographic  order  precedes 
that  of  tj.  This  notation  carries  over  to  the  binary  representation  of  the  number 
x(K).  Comparing  the  binary  representation  of  x(K )  with  K ,  we  have  the  following 
schematic: 

Bit#:  •••  n  —  •••  n  —  z2  •••  n  —  i(  •••  0 

x(K):  <m  0  •••  0  1  0  •••  0  I  0  0  1  0  •••  0 

I  I  I 

A  :  |  <*«,  s„,  s,2,  •••  sit } 

The  actions  that  we  will  construct  will  force  the  system  to  traverse  an  exponential 
number  of  knowledge  states,  beginning  with  the  state  of  complete  uncertainty 
{1,  •••,«},  and  ending  with  the  goal  state  {1},  in  an  order  that  corresponds  to 
counting  downwards  from  2n  —  1  to  2n_l.  For  the  special  case  rz  =  4,  this  corresponds 
to  the  following  transitions  (for  later  reference  the  transitions  are  also  labelled  with 
the  associated  actions): 
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K 

x(K) 

(actions) 

{1,2, 3, 4} 

1 

15 

1 

a3 

{1.2.3) 

I 

14 

1 

b2 

{1,2,4} 

1 

13 

1 

a7 

{1,2} 

i 

12 

1 

By 

{1,3,4} 

i 

11 

1 

-43 

{1.3} 

i 

10 

1 

b2 

{1,4} 

i 

9 

I 

4, 

{1} 

8 

Let  us  first  define  the  actions  {-4*}.  These  are  designed  to  count  down  from 
knowledge  states  A'  whose  associated  numbers  x(K)  are  odd.  Since  an  odd  number 
contains  a  one  in  the  least  significant  bit.  the  knowledge  state  must  contain  the  state 
sn.  The  actions  {4*}  are  designed  to  remove  this  state.  We  have,  for  k  =  — 1, 


Ak  :  1  > — ►  1 

2  2 

k  i-»  k 

k  +  1  1 — *  1 , 2, ....  n 


n  —  1  t— ►  1, 2, . . .  ,n 

n  •— i ►  k. 

[Note,  of  course,  that  if  k  =  n  —  1,  then  Ak '■  n  —  l*-»n  —  1.] 

Similarly,  the  actions  {i5*}  are  designed  to  count  down  by  one  from  knowledge 
states  whose  associated  numbers  are  even.  Thus  these  actions  must  worry  about 
borrowing  properly  from  higher  order  bits.  We  have,  for  k  =  1, . . . ,  n  —  2, 
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1 

1 — ► 

1 

2 

h-* 

2 

k 

1 — ► 

k 

k  +  1 

h— ► 

k  +  2,  k  +  3, . . .  ,n 

k  +  2 

1 — ► 

1,2,. ..,n 

n 

► — ► 

1, 2, . . . ,  n. 

For  the  special  case  n  =  4,  we  have  the  following  five  actions: 


1  — 

1 

-4  2  : 

1  ^ 

1 

A3  :  1  1 — *  1 

2  h-* 

1,2, 3.4 

2  f  + 

2 

2h»  2 

3  HM 

1, 2,3,4 

3  •— ► 

1,2, 3, 4 

3m  3 

4 

1. 

4  ^ 

2, 

4  m+  3 

1  m- 

1 

B2: 

1  »-» 

1 

2 

3,4 

2 

2 

3  t— ► 

1,2, 3,4 

3  >—+ 

4 

4  i — ► 

1.2, 3,4, 

4  •— > 

1,2, 3, 4 

Claim  3.19  For  the  actions  and  task  defined  above,  there  exists  a  guaranteed 
strategy  that  traverses  essentially  all  knowledge  states,  in  the  order  described  above. 
Specifically,  the  strategy  traverses  all  knowledge  states  that  contain  state  s\.  There 
are  2n_1  such  knowledge  states.  Furthermore,  there  is  no  shorter  guaranteed  solution. 

Proof.  First,  let  us  show  that  for  every  knowledge  state  K  there  is  some  action 
that  makes  progress.  In  this  case  progress  means  that  the  number  determined  by  the 
bit-vector  representation  of  K  is  decreased.  In  fact,  we  will  exhibit  an  action  that 
decreases  x(K)  by  exactly  one. 

Suppose  that  x(K)  is  odd.  Let  k  be  the  order  of  the  least  significant  bit  other 
than  bit  #0  which  is  set  to  1.  Then 


k-\ 

x(K)  =  <wl  6“*“0  1, 

meaning  that  K  =  {<*a,sn_fc,  sn}.  Now  note  that  An_k{K)  =  so 

x{An_k(K))  =  x{h')  -  1,  as  desired. 

On  the  other  hand,  suppose  that  x(K)  is  even.  Again,  let  k  be  the  order  of  the 
least  significant  bit  that  is  set  to  1.  Then  k  >  1,  and 
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If  y  =  x(A')  —  1,  then 


X(K)  =  <433  1  0  -  -  -  0 . 


k 

y  =  <t»0  1  -  •  •  1  . 

This  says  that  K  =  {<w3,sn_*}  and  that  K(y)  =  {<jm,  s„_*+i,  sn_k+2,  •  •  • ,  sn}-  Now 
note  that  Bn_k_\(K)  =  K(y),  as  desired. 

We  have  shown  that,  for  any  knowledge  state  K ,  there  is  a  strategy  for  counting 
down  from  x(A').  In  particular,  suppose  K  =  {i’i,  •••,**},  with  i\  <  ■  ■  •  <  i(.  If 
i i  =  1,  then  one  can  count  from  x(K)  down  to  2n_1,  at  which  point  the  goal  is 
attained.  On  the  other  hand,  if  *i  >  1,  then  one  can  count  down  from  2n  1  +  x(  A')  to 
2n-\  at  which  point  the  goal  is  attained.  This  amounts  to  pretending  that  s,  6  K . 
Alternatively,  one  could  just  count  down  from  x(A')  to  1,  which  places  the  system  in 
state  sn.  Applying  action  Ai  then  attains  the  goal.  If  one  looks  at  the  details,  these 
two  strategies  are  really  the  same.  After  all.  the  counting  never  involves  changing  the 
bit  corresponding  to  Si. 

Second,  we  must  show  that  applying  the  wrong  action  at  a  knowledge  state  cannot 
make  further  progress.  This  will  establish  that  the  strategy  just  outlined  is  the 
shortest  strategy  guaranteed  to  attain  the  goal. 

So,  suppose  that  knowledge  state  K  is  given,  and  let  x  =  x(K). 

Consider  applying  action  .4*,  for  some  k.  If  A'  n{5fc+ii  •  •  • ,  sn-i }  #  0  then 
Ak(K)  =  {$i, which  is  certainly  not  progress.  On  the  other  hand,  if 
A  C  { 5 1 ,  •  •  •  .  sk }.  then  Ak(I\)  =  K.  which  again  is  n;t  progress.  That  leaves  the 
possibility  that  K  C  {si .  •  •  • ,  s* ,  U{sr  I  •  Suppose  that  both  sk  £  A  and  sn  £  Ah 
I  hen  .4*  is  designed  to  make  progress  at  A',  so  that’s  fine.  On  the  other  hand, 
suppose  that  s*  £  A'  and  sn  £  Ah  Then  K  =  {<*B,sn},  while  Ak(K)  =  {<jm, sk}-  Note 
that  x(Ak(K))  >  x(A’),  so  this  motion  also  does  not  make  prorgess. 

Consider  applving  action  Bk ,  for  some  k.  If  K  C  {aj,  •  •  • ,  s*},  then  Bk(K )  =  Ah 
which  means  no  progress.  If  K  contains  any  elements  from  the  set  {sjt+2,  •  •  • ,  sn}, 
then  Bk(K)  is  the  entire  state  space,  that  is,  complete  uncertainty.  The  remaining 
case  says  that  K  -  {<*e,s*h},  but  then  K  is  of  the  form  for  whic*  Bk  was  designed 
to  make  progress.  | 

Observe  that  the  previous  proof  also  shows  that  if  the  state  of  the  system  is  known 
exactly,  say  K  -  {s,},  then  the  only  reasonable  strategy  for  attaining  the  goal  is  to 
count  down  to  1  from  2n-\  followed  by  an  application  of  action  A\.  This  is  be;ause 
applying  ie  wrong  action  at  a  knowledge  state  essentially  has  one  of  *-,o  eii  ^ts: 
Either  (1)  the  action  does  not  change  the  knowledge  state,  or  (2)  the  action  yields 
complete  uncertainty.  The  exception  to  this  ruit  is  given  by  the  effect  on  state  sn , 
but  this  state  lies  on  action  away  from  the  goal,  and  misapplying  an  action  when 
the  system  is  in  state  sn  only  moves  it  further  away. 

Thus  we  have 
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Claim  3.20  There  exists  a  near-sensorless  discrete  planning  problem  in  which  the 
shortest  guaranteed  strategy  has  exponential  length.  Furthermore,  the  expected 
running  time  of  any  randomized  strategy  is  also  exponential  in  the  number  of  states 
and  actions. 


3.13.5  Exponential-Sized  Backchaining 

The  following  example  demonstrates  that  there  are  sensorless  tasks  for  which  the 
dynamic  programming  approach  of  backchaining  can  generate  a  table  of  exponential 
size  even  if  one  only  backchains  a  linear  number  of  steps.  In  fact  we  will  exhibit  an 
example  with  n  states  and  n2  —  n  actions  in  which  the  knowledge  state  S  is  obtained 
in  the  n  —  1st  column  of  the  dynamic  programming  table,  and  in  which  an  exponential 
number  of  knowledge  states  are  generated  in  between.  Of  course,  this  implies  that 
there  exists  a  last  strategy  for  attaining  the  goal.  Indeed,  there  is  a  linear-time 
strategy.  Furthermore,  it  may  be  possible  to  arrive  at  that  strategy  quickly,  by  using 
an  approach  other  than  the  dynamic  programming  approach.  For  our  particular 
example  all  actions  will  be  deterministic.  This  immediately  says  that  there  is  a 
fast  planning  algorithm,  using  Natarajan’s  graph-searching  techniques  (see  [Nat86]). 
However,  one  can  easily  modify  the  actions  so  that  they  are  non-deterministic.  In 
short,  this  example  says  nothing  about  the  fundamental  complexity  of  planning  under 
uncertainty,  merely  something  about  planning  using  backchaining.  More  fundamental 
results  are  contained  in  [Pap]  and  [PT],  as  we  have  already  mentioned. 

The  state  space  is  S  =  {si,  •  •  • ,  sn}  =  {I,  •  •  • ,  n).  The  n2  —  n  actions  are  given  by: 


>M5*) 


Si,  if  k  =  j 
s fc,  otherwise. 


1  <  i,j  <  n,  i  ^  j. 


In  other  words,  A,j  collapses  the  two  states  s,  and  Sj  to  the  state  s,,  while  leaving  all 
other  states  invariant.  There  is  no  sensing. 

We  will  start  the  backchaining  process  off  by  assuming  that  any  singleton  state 
is  a  goal.  In  other  words,  if  the  system  can  unambiguously  move  into  some  single 
state,  then  it  has  achieved  its  goal.  It  is  easy  to  change  this  problem  into  one  in 
which  the  system  must  attain  a  particular  goal  state,  by  adding  an  action  and  a  state 
to  the  construction.  In  any  event,  we  may  assume  that  column  number  zero  of  the 
dynamic  programming  table  contains  entries  for  all  knowledge  states  of  the  form  { k }, 
for  k  =  1,  •  •  •  ,n. 

Now  suppose  that  the  planner  is  backchaining  from  the  Ith  column  of  the  dynamic 
programming  table,  and  that  all  the  non-blank  entries  in  this  column  are  of  size 
at  most  t  • f  1.  Suppose  further  that  the  collection  of  non-blank  entries  includes  all 
knowledge  states  of  size  l  4-  1.  Since  no  action  collapses  more  than  two  states,  it  is 
impossible  to  obtain  knowledge  states  of  size  greater  than  £  +  2  in  the  t  +  1*‘  column. 
However,  it  is  possible  to  obtain  all  knowledge  states  of  size  £  -f  2.  This  says  that 
precisely  in  column  number  (n  —  1)  the  knowledge  state  5  will  have  its  entry  filled 
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in  for  the  first  time.  Furthermore,  all  other  knowledge  states  will  also  have  had  their 
entries  filled  in. 


3.13.6  The  Odometer 

The  following  physical  device  has  important  commonalities  with  the  graph  example 
presented  in  section  3.13.3.  In  particular,  the  task  described  by  this  device  has  a 
guaranteed  solution  that  requires  an  exponential  number  of  steps,  and  a  randomized 
solution  that  only  requires  an  expected  linear  number  of  steps. 

Imagine  a  series  of  n  horizontal  plates  or  wheels  arranged  vertically  above  each 
other.  The  plates  are  connected  by  a  gearing  mechanism  that  acts  much  like  an 
odometer.  Specifically,  a  primitive  action  consists  of  turning  a  plate  one-tenth  of 
a  revolution.  Call  this  a  partial  turn.  Whenever  a  given  plate  turns,  it  also  turns 
the  plate  above  it.  but  at  one  tenth  the  speed,  so  each  time  a  plate  makes  one  full 
revolution,  the  plate  immediately  above  makes  a  partial  turn.  Similarly,  turning  a 
plate  turns  the  plate  directly  below  it  at  ten  times  the  speed.  There  is  a  crank  below 
the  bottom  plate  which  turns  that  plate,  and  consequently  all  other  plates  at  reduced 
speeds.  Under  certain  circumstances  mentioned  later,  individual  plates  may  also  be 
turned  directly.  The  crank  and  any  individual  plate  can  only  be  turned  at  a  specific 
fixed  speed,  say.  one  partial  turn  per  unit  time.  (Turning  an  individual  plate  directly 
also  turns  the  other  plates  via  the  gearing  mechanism,  as  described  earlier.) 

On  one  of  the  plates  is  a  ball.  The  ball  arrives  from  a  distribution  bin  which 
non-deterministicailv  places  the  ball  on  a  non-deterministically  chosen  plate.  There 
is  a  chute  next  to  each  plate.  Turning  the  plate  so  that  the  ball  passes  by  this  chute 
causes  the  ball  to  roll  off  the  plate,  down  the  chute,  and  onto  the  plate  below.  The 
chutes  are  themselves  arranged  in  unison  above  each  other.  They  are  hinged  to  a 
vertical  pole,  and  may  be  swung  away  from  the  plates.  In  this  case,  if  a  plate  is 
turned  so  that  the  ball  passes  by  the  location  at  which  the  chute  would  normally  be, 
the  ball  simply  drops  vertically.  If  the  ball  is  not  caught  by  someone,  it  reenters  the 
distribution  bin  and  is  once  again  non-deterministically  placed  on  a  plate.  The  plates 
cannot  be  turned  individually  when  the  chutes  are  in  place;  only  the  crank  may  be 
used.  However,  the  plates  may  be  turned  individually  when  the  chutes  have  been 
swung  away  from  the  plates. 

There  are  thus  two  ways  to  remove  a  ball  from  a  plate.  The  first  is  to  swing  the 
chutes  away  from  the  plates,  move  one’s  hand  up  to  the  plate  containing  the  ball, 
then  turn  the  plate  until  the  ball  falls  out  and  onto  one’s  hand.  The  second  way  is 
to  swing  the  chutes  into  place,  then  turn  the  crank  until  the  belli  emerges  from  the 
bottom  plate. 

The  first  approach  requires  turning  the  given  plate  at  most  10  partial  turns  before 
the  ball  falls  out.  The  second  approach  may  require  turning  the  crank  as  many  as 
Y  (10"  —  1)  partial  turns,  should  the  ball  happen  to  be  on  the  top  plate  at  the  start. 
Clearly,  assuming  that  one  can  determine  on  which  plate  the  ball  is  resting,  the  first 
approach  is  preferable. 

Now,  suppose,  however,  that  one  cannot  determine  on  which  plate  the  ball  is 
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resting.  For  instance,  the  plates  might  be  covered.  Then  the  only  guaranteed  strategy 
for  removing  the  ball  is  to  turn  the  crank  with  the  chutes  in  place,  until  the  ball 
emerges.  Turning  any  individual  plate,  with  the  chutes  swung  away,  runs  the  risk  of 
causing  the  ball  to  drop  from  a  plate,  forcing  it  back  into  the  distribution  bin.  From 
a  worst-case  point  of  view,  that  strategy  might  never  terminate.  Consequently,  the 
only  guaranteed  strategy  may  require  a  long  time  to  execute. 

Fortunately,  a  randomized  solution  consists  of  guessing  the  plate  on  which  the 
ball  is  resting,  then  acting  as  if  that  plate  did  indeed  hold  the  ball.  In  other  words, 
in  the  absence  of  a  sensor,  the  randomized  strategy  simulates  one.  With  probability 
1/n,  the  strategy  will  pick  the  correct  plate.  If  it  picks  the  wrong  plate,  then  the 
ball  is  repositioned,  and  the  strategy  can  try  again.  The  expected  number  of  partial 
turns  until  the  ball  emerges  is  thus  bounded  by  lOn.  This  is  only  a  linear  factor  more 
than  in  the  case  in  which  a  sensor  is  available,  well  below  the  exponential  guaranteed 
strategy.5 

3.14  Summary 

This  chapter  considered  the  problem  of  planning  in  the  presence  of  uncertainty 
in  discrete  spaces.  The  standard  dynamic  programming  approach  was  extended 
to  include  an  operator  that  would  purposefully  make  randomizing  choices.  The 
motivation  for  including  this  operator  was  to  extend  the  class  of  solvable  tasks  beyond 
those  solvable  by  guaranteed  strategies.  Not  all  tasks  admit  to  what  traditionally  are 
considered  guaranteed  solutions.  These  are  solutions  that  are  certain  to  accomplish 
their  tasks  in  a  fixed  and  bounded  number  of  run-time  operations  that  may  be 
ascertained  at  planning  time.  There  are  many  tasks  that  one  would  consider  solvable 
simply  because  they  may  be  accomplished  frequently  even  if  not  always.  By  placing 
a  loop  around  a  strategy  that  tries  to  solve  such  a  task,  one  can  often  be  certain  of 
a  solution  eventually.  Although  in  principle  the  solution  could  require  an  unbounded 
amount  of  time,  often  one  may  be  able  to  compute  the  expected  time  until  the 
task  is  solved.  In  particular,  by  purposefully  randomizing  its  decisions  a  strategy 
can  sometimes  enforce  a  minimum  probability  of  success  on  any  particular  attempt, 
thereby  placing  an  upper  bound  on  the  expected  time  until  task  completion. 

The  basic  scheme  is  to  compute  partial  plans  that  are  guaranteed  to  accomplish 
portions  of  the  task.  Generally  these  partial  plans  will  only  succeed  if  fairly  stringent 
initial  conditions  are  satisfied.  While  any  particular  plan’s  preconditions  may  not  be 
satisfiable,  the  union  of  all  the  preconditions  may  be  satisfiable.  This  means  that  in 
fact  some  partial  plan’s  preconditions  are  satisfied,  but  due  to  uncertainty  the  system 
cannot  ascertain  which  plan’s  preconditions.  In  that  case  it  makes  sense  to  guess  the 
appropriate  partial  plan.  Effectively  the  strategy  is  executing  a  randomizing  action 
by  guessing  which  partial  plan  is  applicable.  If  the  guess  is  correct,  then  the  task  will 

5Of  course,  the  randomized  strategy  may  require  more  than  the  expected  number  of  trials  to 
succeed  on  any  particular  execution.  However,  the  probability  of  requiring  several  factors  of  this 
expectation  decreases  exponentially  quickly  in  the  number  of  factors. 
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be  accomplished.  Otherwise,  the  system  wii!  need  to  guess  again,  until  it  eventually 
accomplishes  the  task. 

Of  particular  interest  were  simple  feedback  loops.  Theses  axe  strategies  that  only 
consider  current  sensed  values  in  deciding  on  motions  to  execute.  Such  strategies  are 
often  useful  when  there  is  some  progress  measure  on  the  state  space  that  measures 
the  system’s  distance  from  task  completion.  Whenever  possible,  the  system  will 
execute  an  action  that  makes  progress  relative  to  the  progress  measure.  Otherwise, 
the  system  will  execute  a  randomizing  motion.  The  purpose  of  the  randomizing 
motion  is  to  either  accomplish  the  task  or  move  to  some  location  from  which  the 
available  sensory  information  again  permits  progress.  In  this  context  the  chapter 
explored  various  types  of  random  walks.  It  was  shown  that  if  the  expected  speed 
of  progress  is  uniformly  bounded  away  from  zero,  then  it  is  possible  to  bound  the 
expected  time  until  task  Completion,  i  he  bounu  is  the  intuitively  desirable  bound 
of  distance  divided  bv  expected  velocity,  where  distance  is  defined  by  the  progress 
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CHAPTER  3.  RANDOMIZATION  IN  DISCRETE  SPACES 


Chapter  4 
Preimages 


In  this  and  the  next  chapter  we  will  turn  our  attention  to  continuum  spaces,  primarily 
spaces  such  as  9?n.  The  same  ideas  that  appeared  in  the  chapter  on  discrete  planning 
problems  will  appear  in  the  context  of  continuous  planning  problems.  In  particular, 
the  notions  of  expected  progress  and  randomization  by  guessing  starting  states  will 
carry  over  naturally  and  prove  useful.  Rather  than  develop  the  whole  framework 
afresh,  we  will  focus  on  particular  examples  and  results  that  should  make  the 
connection  between  the  continuous  and  discrete  cases  clear. 


4.1  Preimage  Planning 

In  the  chapter  on  discrete  planning  problems,  planning  with  uncertainty  was  viewed 
as  planning  in  the  space  of  knowledge  states.  This  view  effectively  reduced  the 
problem  of  finding  a  guaranteed  strategy  in  a  space  with  both  imperfect  control  and 
imperfect  sensing  to  a  backchaining  problem  in  a  space  with  imperfect  control  and 
perfect  sensing.  Backchaining  was  implemented  by  dynamic  programming,  using  a 
boolean  cost  function.  A  similar  approach  app'ies  in  continuous  spaces.  The  preimage 
planning  approach  developed  by  [LMT]  formally  introduced  this  notion  into  robotics. 
We  will  briefly  review  this  approach  in  this  section.  The  domain  will  be  taken  to 
be  the  configuration  space  of  the  robot  or  part  being  moved  relative  to  whatever 
obstacles  there  may  be  in  the  environment  (see  [Loz83]).  We  will,  however,  often 
restrict  ourselves  to  S?",  for  some  n,  with  polyhedral  obstacles.  This  might  correspond 
to  the  configuration  space  of  either  a  cartesian  robot  or  a  polyhedral  part  which  is 
only  permitted  to  translate  but  not  to  rotate. 

Uncertainty 

First  let  us  define  uncertainty.  We  have  already  indicated  that  sensing  errors  are 
modelled  as  bounded  error  balls.  Thus,  if  the  system  is  in  state  x  at  execution  time, 
then  the  position  sensor  may  return  a  value  x*  that  lies  within  some  distance  e,  of 
x.  In  the  language  of  chapter  3,  once  we  postulate  full  sensing  consistency,  then  the 
collection  of  possible  sensory  interpretation  sets  is  given  by  the  collection  of  balls 
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{B(l{x’)},  as  x'  varies  over  5*,(x).  If  the  sensors  are  more  complicated  than  this, 
in  particular  if  there  are  sensors  that  measure  other  attributes  of  the  system,  such 
as  velocity  or  force,  then  one  can  model  this  by  increasing  the  dimensionality  of  the 
state  space  of  the  system  to  include  these  other  attributes.  Alternatively,  if  the  future 
state  of  the  system  does  not  depend  on  these  attributes,  then  one  need  not  raise  the 
dimensionality  of  the  planning  space.  Instead,  one  can  model  the  additional  sensing 
information  by  projecting  it  into  the  original  state  space.  For  example,  the  measured 
force  may  indicate  that  the  object  is  in  contact  with  some  surface  S.  This  generally 
reduces  the  position  sensing  uncertainty,  by  selecting  a  lower  dimensional  slice  of  the 
sensing  error  ball,  corresponding  to  the  intersection  of  the  surface  with  the  position 
interpretation,  that  is,  5f|5ft(x*).  There  are  some  subtleties  here.  For  instance, 
the  interpretation  of  a  force  or  a  velocity  may  depend  on  the  action  executed.  This 
means  that  possible  sensory  interpretation  sets  must  now  be  modelled  not  only  as 
functions  of  the  state  of  the  system,  but  also  as  functions  of  the  commanded  action. 
While  we  did  not  model  this  dependence  in  the  discrete  setting,  doing  so  does  not 
pose  any  fundamental  difficulties.  Having  said  all  this,  we  will  basically  ignore  sensing 
of  attributes  other  than  position  in  our  examples.  For  more  detailed  investigations 
of  sensing  in  the  context  of  preimage  planning  see  [LMT],  [Mas84],  [Erd84],  [Buc], 
[Don89],  (Can88j,  [Lat],  among  others. 

Control  uncertainty  is  defined  similarly.  At  execution  time,  whenever  a  nominal 
control  command  is  issued,  the  actual  effect  on  the  system  is  given  by  a  range  of 
effective  commands  that  lie  in  some  error  ball  about  the  nominal  command.  More 
general  models  of  control  uncertainty  are  of  course  possible.  Within  the  LMT 
preimage  methodology,  the  envisioned  commands  are  either  applied  forces  or  applied 
velocities.  In  fact,  LMT  focuses  on  an  equivalence  between  forces  and  velocities 
given  by  modelling  dynamics  as  generalized  damper  dynamics,  an  assumption  that 
produces  a  first-order  system.  Specifically,  control  commands  are  nominal  velocities 
v0;  the  evolution  of  the  system  is  governed  by  the  first-order  equation 

(4.1)  F  =  B  (v  —  Vq), 

where  v  is  the  actual  velocity  of  the  system,  F  is  the  force  exerted  by  the  environment 
on  the  system,  B  is  a  damping  matrix,  and  Vq  is  the  effective  commanded  velocity.  The 
damping  matrix  is  often  simply  taken  to  be  the  identity  matrix,  perhaps  multiplied  by 
some  gain  factor.  Control  uncertainty  is  represented  by  the  term  vJJ.  This  is  assumed 
to  lie  in  some  error  ball  £?(„(v0)  about  the  nominal  commanded  velocity.  See  figure 
4.1.  It  is  sometimes  convenient  to  think  of  the  velocity  error  as  defining  an  error  cone. 
This  cone  represents  the  trajectories  that  can  locally  emanate  from  a  given  point. 

Generalized  damper  dynamics  are  convenient,  since  they  model  the  (error-free) 
trajectories  of  the  system  as  piecewise  linear  motions.  Similarly,  in  the  presence  of 
uncertainty,  the  possible  trajectories  may  be  modelled  as  cones.  For  further  discussion 
on  generalized  damper  dynamics  see  [Whit77]  and  [LMT],  We  will  henceforth  assume 
that  the  dynamics  are  generalized  damper  dynamics  in  3?n,  with  polyhedral  obstacles. 

Observe  that  these  models  of  uncertainty  are  bounded  worst-case  models.  In 
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Figure  4.1:  Velocity  error  ball  about  a  nominal  velocity  command.  If  one  is  only 
interested  in  directions,  sometimes  it  is  useful  to  think  of  the  error  an  error-cone. 


other  words,  nothing  is  said  about  the  actual  distribution  of  sensor  values  or  control 
commands  within  the  uncertainty  balls.  The  distributions  may  be  probabilistic,  they 
may  be  fixed  biases,  or  they  may  even  be  chosen  in  a  worst-case  manner  by  an 
adversary. 

For  future  convenience  we  will  also  assume  that  the  sensing  and  control  error  balls 
are  all  open  balls. 

Preimages  and  Termination  Predicates 

Integral  to  the  planning  of  guaranteed  strategies  is  the  notion  of  a  preimage. 
Intuitively,  a  preimage  of  a  collection  of  goals  is  a  region  in  state  space  from  which 
a  certain  action  is  guaranteed  to  attain  one  of  the  goals,  and  do  so  in  a  recognizable 
manner.  Goals  are  themselves  modelled  geometrically  as  regions  in  state  space. 
Forming  the  preimages  of  a  goal  is  analogous  to  backchaining  one  column  in  the 
dynamic  programming  table.  However,  it  is  not  exactly  the  same  thing.  Into  the 
definition  of  a  preimage  enters  the  notion  of  a  termination  predicate.  The  termination 
predicate  is  the  decision  process  that  terminates  a  motion  at  run-time,  signalling 
goal  attainment.  The  amount  of  information  that  a  termination  predicate  considers 
determines  the  power  of  the  planning  system  to  solve  certain  tasks.  Essentially,  the 
termination  predicate  performs  the  forward  projection  of  states  and  the  intersection 
with  sensory  interpretation  sets  discussed  in  the  discrete  setting  If  a  termination 
predicate  considers  only  current  sensed  values  in  deciding  goal  attainment,  then,  in 
the  terminology  of  chapter  3,  one  has  a  planning  problem  involving  strategies  that 
are  simple  feedback  loops.  If  the  termination  predicate  considers  all  possible  past 
sensed  values  as  well  as  time-indexed  forward  projections  then  one  has  a  planning 
problem  analogous  to  the  full  dynamic  programming  approach  discussed  in  the 
discrete  setting.  There  are  numerous  intermediate  possibilities,  some  of  which  did  not 
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seem  as  evident  in  the  discrete  case.  One  important  variation  is  to  forward  project  the 
start  region  under  a  given  commanded  velocity,  but  then  to  use  only  current  sensor 
values  intersected  with  this  forward  projection  in  determining  goal  attainment.  See 
[Erd86j.  See  also  page  209  for  further  discussion  of  termination  predicates. 


Knowledge  States 

One  important  characteristic  of  the  termination  predicates  employed  in  the  LMT 
framework  is  their  Markovian  nature.  This  means  that  the  entire  information 
available  to  a  termination  predicate  at  any  given  time  may  be  summed  up  in  a  single 
set  describing  the  possible  configurations  of  the  system.  In  the  discrete  setting  this 
set  was  referred  to  as  a  knowledge  state.  The  existence  of  such  a  knowledge  state 
assumes  that  the  state  space  of  the  system  is  Markovian  as  well,  that  is.  that  the 
future  behavior  of  the  system  depends  only  on  the  current  state  of  the  system  and  the 
action  being  executed.  It  also  assumes,  as  we  have  been  throughout  the  thesis,  that 
the  sensor  values  obtained  at  execution  time  depend  only  on  the  current  state  of  the 
system.  An  implication  cf  this  observation  is  that  a  termination  predicate  can  forget 
the  exact  sensor  values  and  forward  projections  that  gave  rise  to  the  current  knowledge 
state.  Equivalently,  supplying  a  termination  predicate  with  a  given  knowledge  state 
and  starting  a  motion  from  anywhere  inside  the  set  of  configurations  described  by 
that  knowledge  state  permits  the  termination  predicate  to  make  precisely  the  same 
decisions  that  it  would  have  made  if  it  had  encountered  the  same  knowledge  state 
during  a  motion  that  had  originated  from  some  other  region  at  some  prior  time.  See 
[Mas84j  tor  a  desciipuun  of  how  a  termination  predicate  functions. 


Actions  and  Time-Steps 

One  aspect  may  be  troubling  in  comparing  the  discrete  and  continuous  settings.  In 
the  continuous  setting  it  seems  that  one  always  needs  a  termination  predicate  to 
stop  a  motion.  After  ail,  the  basic  commands  are  velocities,  so  one  needs  some  form 
of  termination  to  switch  between  different  velocities.  In  contrast,  in  the  discrete 
setting,  termination  predicates  were  never  explicitly  required.  Instead,  each  step 
involved  some  action,  which  terminated  by  definition,  whereupon  the  available  sensory 
information  was  used  to  select  a  new  action.  In  fact,  the  analogy  between  the 
discrete  and  continuous  settings  becomes  apparent  if  one  considers  actions  to  be 
velocities  executed  over  some  duration  of  time.  In  particular,  velocities  executed  over 
infinitesimal  time,  or  over  the  cycle  time  of  the  control  loop,  form  the  natural  analogue 
in  the  continuous  case  of  the  single-step  actions  in  the  discrete  case.  Conversely,  a 
velocity  executed  until  some  termination  predicate  signals  goal  attainment  has  as 
counterpart  in  the  discrete  setting  a  repeated  application  of  the  same  action  until 
some  condition  that  is  a  combination  of  sensory  information  and  history  is  satisfied. 
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Figure  4.2:  The  task  is  to  slide  the  peg  into  the  hole.  Given  large  position  sensing 
uncertainty,  a  simple  feedback  loop  that  does  not  remember  its  past  state  will  become 
confused  near  the  hole. 


History 

Notice  that  once  one  establishes  that  primitive  actions  are  really  velocities  executed 
over  a  small  duration  of  time,  then  the  notion  of  a  simple  feedback  loop  makes  sense 
both  in  the  discrete  and  continuous  cases.  It  is  simply  a  control  loop  in  which  at  each 
instant  in  time  the  command  issued  depends  only  on  the  current  sensed  values.  In 
contrast,  the  notion  of  a  preimage  which  emoloys  an  action  over  an  extended  period 
of  time  tacitly  includes  some  history.  This  history  may  simply  be  the  information 
implicit  in  knowing  that  a  termination  predicate  wiil  eventually  signal  success.  As 
an  example,  consider  the  task  of  sliding  a  peg  into  a  hole,  as  in  figure  4.2.  If  position 
sensing  uncertainty  is  large,  then  the  system  cannot  know  which  side  of  the  hole  the 
peg  is  on  once  it  is  near  the  hole.  Thus  a  simple  feedback  loop  would  have  to  resort 
to  randomization  as  used  in  the  example  of  section  2.4.  On  the  other  hand,  if  the 
system  is  far  enough  away  from  the  hole,  then  it  can  decide  which  way  to  move. 
Having  chosen  a  motion  direction,  and  a  termination  predicate  that  recognizes  goal 
attainment  by  noting  that  the  peg  is  falling  into  the  hole,  the  system  can  proceed 
to  move  in  the  correct  direction,  ignoring  all  sensor  values  except  the  final  one  that 
signals  goal  attainment.  In  short,  there  are  two  preimages,  corresponding  to  being  far 
enough  to  the  left  or  right  of  the  hole.  And  although  it  is  true  that  the  termination 
predicate  does  not  need  history  to  recognize  goal  attainment,  the  strategy  employs 
history  in  knowing  that  certain  sensor  values  are  irrelevant.  The  history  is  implicitly 
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used  to  rule  c.if  „r.e  confusion  that  a  simple  feedback  loop  would  encounter.  This 
is  an  important  distinction,  which  makes  clear  that  a  preimage  in  the  continuous 
setting  corresponds  to  a  special  type  of  strategy  with  history  in  the  discrete  setting. 
In  particular,  a  preimage  is  a  strategy  that  locally  is  guaranteed  to  make  progress 
cowards  the  goal. 

Preimages:  Definition 

The  preimage  R  relative  to  a  commanded  velocity  v0  of  a  collection  of  goals  {G0}  is 
specified  implicitly  as  the  solution  to  an  equation  of  the  form 

PVo.r({G*})  =  R. 

Here  the  operator  Pv0./<  defines  a  subset  of  the  region  R  from  which  recognizable 
goal  attainment  is  guaranteed.  Recognizable  attainment  means  that  the  termination 
predicate  will  successfully  halt  the  motion,  specifying  which  goal  Ga  has  been 
attained.  The  termination  predicate  is  given  the  start  region  R  as  data,  and  may  use 
this  data  in  deciding  whether  the  goal  has  been  attained.  Of  course,  the  termination 
predicate  need  not  use  R.  For  instance,  if  the  termination  predicate  being  employed 
only  considers  current  sensed  values,  then  it  would  ignore  the  start  region  R.  See 
[L.MT1.  [MasMj.  [ErdSGl.  and  [Don89j  for  further  details  on  the  specification  of  the 
preimage  equation.  We  will  content  ourselves  here  with  this  brief  explanation,  bearing 
in  mind  the  planning  approach  discussed  in  the  chapter  on  discrete  planning  problems. 

Planning  by  Backchaining 

Planning  a  guaranteed  strategy  consists  of  backchaining  preimages,  much  like  in 
the  dynamic  programming  approach.  The  analogy  in  the  discrete  setting  would  be 
to  backcham  several  substrategies  each  of  which  makes  progress  locally  until  some 
subgoal  is  attained.  In  the  continuous  case  the  formal  definition  proceeds  as  follows 
(see  [LMT]  and  [MasS4]  for  further  details).  Let  Go  =  {Go}  be  the  collection  of 
task- level  goals.  Now,  suppose  that  Gk  is  defined  as  some  collection  of  subgoals  to  be 
attained.  One  backchains  by  forming  all  preimages  R$,k+ 1,  which  satisfy  the  preimage 
equation  Pv0,ftfl *+,(£*)  =  Rg.k+i,  for  some  commanded  velocity  v0  =  v0(/?/j,t+i)  that 
depends  on  the  actual  preimage.  This  collection  of  preimages  forms  the  collection  of 
subgoals  for  the  next  level  of  backchaining,  that  is,  Gk+i  =  {Rp.k+i  }b£B,  where  B  is 
some  appropriate  index  set.  Planning  either  stops  when  some  preset  limit  on  k  has 
been  reached,  or  when  no  further  preimages  can  be  computed.  The  task  is  said  to  be 
solvable  if  the  initial  knowledge  state  of  the  system  I  is  contained  in  some  preimage 
generated  during  this  backchaining  process.  X  is  a  subset  of  the  state  space  that  is 
known  to  contain  the  actual  initial  state  of  the  system.  Executing  a  strategy  entails 
collapsing  this  recursion,  just  as  in  the  discrete  case.  In  other  words,  given  that  the 
system  is  in  a  preimage  /?,?>  £  Gk  at  the  kth  level,  the  system  executes  action  vo(i?0.j u 
until  some  subgoal  Rs\k- 1  in  the  k  -  1 ,l  level  Gk- 1  is  attained.  This  process  is  repeated 
until  a  task  goal  Ga  is  attained.  We  refer  to  such  a  strategy  as  a  guaranteed  strategy 
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since  it  is  certain  to  attain  a  task  goal  in  a  specific  number  of  steps.  This  stands 
in  contrast  to  a  randomized  strategy,  which  only  has  some  probability  of  attaining  a 
task  goal  and  thus  may  fail  to  solve  a  task  in  any  fixed  number  of  steps. 


4.2  Guessing  Strategies 

With  these  preimage  definitions  in  hand,  one  can  now  define  the  guessing  operator 
SELECT  for  the  continuous  case.  For  the  case  of  initial-state-guessing  this  amounts  to 
backchaining  preimages  until  one  has  a  collection  {Ro,k)a^B  at  the  k‘h  level  that  covers 
the  initial  state  of  the  system  I.  A  randomized  strategy  consists  of  randomly  selecting 
one  of  these  R  as  the  guessed  starting  region,  then  executing  the  guaranteed 
strategy  for  Rg.k-  The  strategy  is  a  guaranteed  strategy  for  attaining  a  task  level  goal, 
in  the  sense  that  the  strategy  would  reliably  and  recognizably  attain  one  of  the  Ga  if 
the  svstem  knew  for  certain  that  its  starting  state  was  in  the  preimage  Rg.k.  However, 
the  starting  state  is  merely  guessed,  and  thus  the  usual  admonishments  regarding 
reliable  goal  recognition  and  reliable  restart  of  the  strategy  apply  [see  section  3.9  for 
the  discrete  easel.  For  this  reason  we  will  assume,  as  we  did  in  the  discrete  case,  that 
the  task-level  goals  { Ga }  are  recognizable.  This  means  that  if  the  system  is  ever  in 
one  of  the  sets  G'c,  it  will  know  so  based  purely  on  current  sensing  and  not  on  the 
history  of  the  motion.  Similarly,  we  will  assume  that  the  system  never  strays  out  of 
some  region  A  .  where  1C  V  C  (J(36e  Rg.k-  In  other  words,  the  sets  {Rj.k}g^B  may 
used  repeatedly  for  restarting  the  guessing  loop. 

The  discussion  of  randomized  strategies  that  guess  the  initial  state  of  the  system 
generalizes  to  the  more  general  case  of  randomized  strategies  that  make  multiple 
guesses,  much  as  discussed  in  section  3.11  for  the  discrete  case. 


4.2.1  Ensuring  Convergence  of  Select 

A  more  serious  issue  is  who* her  the  operator  SELECT  is  meaningful  in  the 
continuous  setting.  Cause  for  concern  stems  from  the  possible  infinite  size  of  the 
collection  {Rj.kjaeB-  H  the  randomized  strategy  must  guess  between  an  infinite 
collection  of  states,  then  there  is  no  guarantee  that  the  probability  of  selecting  the 
correct  preimage  Rg.k  is  non-zero.  As  an  example,  consider  the  problem  in  figure 
4.3.  In  this  example  there  is  no  horizontal  position  sensing,  but  there  is  perfect 
vertical  position  sensing  and  perfect  velocity  control.  For  the  sake  of  example,  let 
us  assume  that  the  system  can  only  move  vertically.  The  goal  is  a  one-dimensional 
region  specified  by  the  slanted  line.  Clearly,  the  vertical  lines  drawn  above  the  goal 
are  all  preimages,  relative  to  a  termination  predicate  that  remembers  the  system’s 
start  region.  [Similarly  for  veilical  lines  below  the  goal,  of  course.]  This  is  because  if 
the  system  knows  on  which  vertical  line  it  is  located,  then  it  knows  at  which  height 
to  stop  a  downward  motion  towards  the  goal.  Now  suppose  that  the  system  does  not 
know  its  horizontal  position,  and  thus  consider  a  randomized  strategy  that  decides 
to  guess  between  the  vertical  lines.  If  the  strategy  guesses  correctly,  then  the  goal 
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Figure  4.3:  This  example  shows  that  preimages  need  not  contain  any  interior,  and 
that  there  may  be  an  infinite  number  of  preimages.  In  the  example,  vertical  position 
sensing  is  perfect,  horizontal  position  sensing  is  non-existent,  and  the  system  can  only 
move  vertically,  with  perfect  velocity  control.  The  goal  is  a  line  in  space.  Preimages 
are  the  vertical  lines  above  the  goal. 
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will  be  attained.  However,  the  probability  of  guessing  correctly  is  zero!  In  short,  the 
randomized  strategy  is  useless.1 

The  previous  example  makes  two  points:  (1)  That  there  may  be  an  infinite  number 
of  preimages  of  a  goal,  and  (2)  that  the  probability  of  guessing  the  correct  start  state 
by  guessing  between  an  infinite  number  of  preimages  may  be  zero.  However,  the 
example  was  highly  contrived,  in  that  control  uncertainty  was  taken  to  be  zero  and 
in  that  the  sensing  error  was  infinite  in  one  dimension  while  zero  in  another.  We 
will  now  explore  some  conditions  that  ensure  a  non-zero  probability  of  guessing  the 
correct  starting  state. 


Constraints  on  Guessing  Probabilities 

Let  us  suppose  that  we  have  a  collection  of  sets  {/?a}  that  covers  another  set  X. 

The  set  X  describes  the  possible  starting  locations  of  the  system.  Each  set  Ra  is 

a  preimage.  Here  a  is  assumed  to  lie  in  some  index  set  .4.  The  operator  SELECT 
chooses  one  of  the  sets  Ra  by  selecting  an  a  from  A.  Once  an  a  has  been  selected,  the 
system  executes  a  strategy  for  attaining  the  goal  from  the  preimage  Ra.  as  outlined 
above  and  in  chapter  3. 

We  assume  that  the  choice  of  a  is  random.  This  means  that  we  think  of  A  as  a 
measure  space  with  some  <r-algebra  and  some  measure  nA  that  determines  how  the 
a  are  chosen.  Thus,  the  probability  that  a  will  lie  in  the  set  B  C  A  is  given  by 

For  instance,  in  the  discrete  case,  A  was  just  a  finite  subset  of  the  integers 

{1.2,  •  •  • ,  <7},  and  was  given  by  nA{B)  =  \B\j\A\  for  every  B  C  A. 

Now.  consider  the  actual  state  of  the  system  x  £  X.  Let  Xfh  be  the  characteristic 
function  of  the  set  R„.  In  other  words,  \'r0(x)  is  1  if  x  £  Ra,  and  0  otherwise.  If  one 
fixes  x.  and  allows  a  to  vary,  then  one  can  think  of  as  a  function  of  a.  Thus 

the  probability  of  correctly  guessing  a  starting  region  Ra  that  contains  the  actual 
state  of  the  system  is  given  by: 

Pc(x)  =  /  XRjxjd/i^a). 

Ja 

Said  differently,  pc(x)  =  nA{Bx ),  where  Bz  is  the  set  of  all  a  for  which  x  £  Ru- 

Now  suppose  that  the  state  r  is  non-deterministically  distributed  over  the  region 
X.  Thus  an  adversary  could  in  principle  choose  x  so  as  to  minimize  the  probability 
of  correctly  guessing  a  region  Ra.  Thus,  in  choosing  A  and  fjA  one  must2  satisfy  the 
constraint 

lWe  note  in  passing  tha*.  this  example  did  not  satisfy  the  criterion  of  reliable  goal  recognition. 
However,  it  is  easy  to  modify  the  example  in  the  manner  outlined  in  the  section  on  the  partial 
equivalence  between  sensorless  and  near -sensorless  tasks  (section  3.13.2),  so  as  to  achieve  reliable 
goal  recognition  while  preserving  the  character  of  the  example. 

2The  term  ‘must  ’  is  bit  stronger  than  necessary.  With  repeated  guessing,  it  is  fine  if  the  probability 
of  guessing  correctly  is  zero  occasionally,  so  long  as  the  sum  of  the  success  probabilities  over  an  infinite 
number  of  guesses  is  unity.  However,  constraint  (4.2)  is  appropriate  if  one  only  considers  individual 
guesses 
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(4.2)  0  <  inf  pc(ar). 

r€X 

The  constraint  says  that  the  probability  of  correctly  guessing  a  preimage  that  includes 
the  actual  state  of  the  system  is  non-zero,  independent  of  the  actual  state  of  the 
system.  Actually,  the  constraint  says  more,  namely  that  the  guessing  probabilities 
are  uniformly  bounded  away  from  zero.  This  ensures  that  the  expected  convergence 
time  is  finite  in  a  loop  that  repeatedly  guesses  the  start  state. 

Similarly,  if  the  state  x  is  randomly  distributed  over  the  set  X ,  with  probability 
measure  v(x),  then  one  must  satisfy  the  constraint 

(4.3)  0  <  /  pc{x)dv(x). 

J  X 

This  constraint  says  that  the  probability  of  guessing  correctly  is  non-zero.  The 
probability  in  this  case  is  evaluated  over  both  the  state  distribution  and  the  guessing 
distribution. 

Cautions 

There  are  two  cautions  that  should  be  mentioned  with  regard  to  repeated  guessing 
attempts  in  the  probabilistic  case. 

First,  we  must  be  careful  in  interpreting  the  probability  integral  of  constraint  (4.3). 
The  probability  p„  =  Jx  pc(x)di/(x )  is  the  probability  of  correctly  guessing  the  starting 
state  assuming  that  the  state  of  the  system  is  randomly  distributed  in  accord  with  the 
distribution  i'.  This  means  that  over  a  large  number  of  different  problem  instances 
satisfying  i/,  the  fraction  of  times  that  the  system  correctly  guesses  the  starting  state 
is  given  by  pv.  It  does  not  necessarily  mean  that  repeated  guessing  during  execution 
of  a  single  problem  instance  will  yield  a  fraction  of  correct  guesses  that  is  roughly 
p„.  This  second  interpretation  is  correct  only  if  the  distribution  u  is  created  anew  on 
each  guessing  loop,  for  instance,  by  purposefully  executing  a  randomizing  strategy 
that  creates  the  distribution  u.  The  point  is  that  the  actual  state  of  the  system  on 
a  particular  execution  trial  is  some  state  x.  If  pc(x)  is  zero,  then  the  system  has 
zero  probability  of  guessing  correctly.  Unless  the  system  is  made  to  change  state 
appropriately  between  guesses,  this  probability  will  remain  zero.  Thus,  even  in  the 
probabilistic  case,  it  often  makes  sense  to  satisfy  constraint  (4.2)  rather  than  merely 
constraint  (4.3). 

For  instance,  let  us  suppose  that  an  incorrect  guess  always  yields  a  strategy  that 
does  not  affect  the  state  of  the  system,  while  a  correct  guess  yields  a  strategy  that 
attains  the  goal.  This  is  of  course  a  strong  assumption,  but  it  will  serve  to  illustrate 
our  caution.  Since  the  caution  holds  even  in  this  special  case,  it  certainly  holds 
more  generally.  Ideally,  we  would  hope  that  the  probability  of  correctly  guessing  the 
starting  state  on  the  nth  guess  is  given  by 

(1  ~Pu)n~lP»- 


(4.4) 


(incorrect) 
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However,  as  we  have  said,  this  is  an  incorrect  interpretation  of  p„.  Instead,  the 
probability  of  correctly  guessing  the  starting  state  on  the  nth  guess  is  given  by 

(4.5)  [  (l  -  pc(x))n~l  pc(x)di/(x).  (correct) 

J  X 

Summing  expression  (4.4)  over  an  infinite  number  of  guesses  yields  unity.  This 
need  not  be  true  for  expression  (4.5). 

This  brings  us  to  our  second  caution.  It  is  generally  true  that  the  measure  u  variec 
on  each  guessing  iteration.  This  is  because  each  guess  results  in  the  execution  of  some 
strategy  that  affects  the  state  of  the  system.  This  further  complicates  the  description 
of  success  probabilities.  Now  the  probability  of  correctly  guessing  the  starting  state 
on  the  nth  trial  depends  not  only  on  the  initial  distribution  v,  as  in  expression  (4.5), 
but  also  on  the  previous  guesses.  We  will  not  examine  this  issue  in  any  detail. 


Success  Maximization 

One  could  also  postulate  conditions  on  p.4  for  maximizing  the  probability  of  successful 
goal  attainment.  This  would  involve  considering  the  effect  on  the  state  of  the  system 
of  executing  a  strategy  derived  from  an  incorrect  guess.  After  all,  in  some  cases  an 
incorrect  guess  can  still  lead  to  goal  attainment.  We  will  not  examine  these  conditions 
here. 


Comparison  of  Non-Deterministic  and  Probabilistic  Constraints 

The  difference  between  the  non-deterministic  constraint  (4.2)  and  the  probabilistic 
constraint  (4.3)  is  the  usual  difference  between  a  worst-case  adversary  and  an  average- 
case  behavior.  If  we  rewrite  constraint  (4.2)  as 

(4.6)  0  <  inf  /  Xf^^dnAi a)  =  inf  nA(Bx), 

reX  Ja  rex 

and  if  we  rewrite  constraint  (4.3)  as 

(4.7)  0  <  [  xRa{x)d(nA  x  v)  =  {pA  x  v){D), 

where  D  =  {(a,x)  |  x  6  Ra } ,  then  this  difference  becomes  clearer.  In  the  non- 
deterministic  case  we  want  each  of  the  slices  Bx  of  D  to  have  sufficient  non-zero 
measure  in  the  space  A.  In  the  probabilistic  case  we  merely  want  the  set  D  to  have 
non-zero  measure  in  the  space  A  x  X.  If  we  go  back  to  the  example  of  figure  4.3, 
and  imagine  that  the  starting  state  is  uniformly  distributed,  then  both  A  and  X  are 
essentially  equal  one-dimensional  intervals,  with  the  usual  measures.  The  set  D  of 
successful  guess-state  pairs  is  the  diagonal  in  the  space  Ax  X,  and  hence  of  measure 
zero.  If  the  goal  were  changed  to  a  strip  of  finite  width,  then  the  vertical  preimages 
would  become  non-degenerate  rectangles,  so  that  D  would  also  become  a  strip  of 
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starting  state  of  the  system 


Typical  preimage  R, 


NSsGoa? 


Range  of  preimages  that  contain  x 

Figure  4.4:  If  the  goal  line  of  figure  4.3  is  changed  to  a  strip  of  non-zero  width, 
then  the  preimages  also  become  non-degenerate.  This  figure  displays  a  typical  such 
preimage.  It  is  a  vertical  strip  of  width  I  that  contains  the  starting  state  of  the 
system.  See  also  figure  4.5. 
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Figure  4.5:  This  figure  graphs  as  a  function  of  the  system’s  x-coordinate  the  set  of 
all  preimages  of  figure  4.4  that  contain  the  state  of  the  system.  Bx  is  the  set  of 
all  preimages  that  contain  x,  and  D  is  the  union  of  these  sets  over  all  x.  See  also 
equations  (4.6)  and  (4.7). 
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finite  width.  Suddenly  the  probability  of  success  would  be  non-zero,  despite  there 
being  an  infinite  number  of  preimages.  See  figures  4.4  and  4.5.3 

4.2.2  Restricting  Select  to  Finite  Guesses 

Let  us  focus  on  ensuring  that  the  sets  Bx  each  have  non-zero  measure  in  the  non- 
deterministic  setting,  by  satisfying  (4.2),  since  satisfying  this  constraint  automatically 
ensures  that  (4.3)  holds  as  well.  We  will  do  this  effectively  by  modifying  the  definition 
of  SELECT  and  insisting  that  it  only  consider  finite  collections  of  covering  preimages. 
In  other  words,  the  index  set  A  is  forced  to  be  finite. 

It  is  clear  that  this  finiteness  requirement  imposes  a  fairly  strong  restriction  on 
SELECT.  In  particular,  the  example  of  figure  4.3  does  not  satisfy  the  requirement. 
Nonetheless,  in  many  instances  the  finiteness  arises  naturally.  Consider  for  instance 
set  R  that  is  covered  by  an  infinite  collection  of  sets  {Rp}.  Now  suppose  further 
that  R  is  bounded,  and  that  in  fact  the  interiors  of  the  Rg  cover  the  closure  of  R,  all 
in  the  usual  topology  on  3?n.  By  compactness  of  the  closure  of  R  it  thus  follows  that 
a  finite  subcollection  of  the  Rg  must  actually  cover  R,  as  desired. 

As  stated  so  far,  this  explanation  is  not  completely  satisfactory.  Among  other 
things,  the  explanation  does  not  properly  take  account  of  preimages  that  have  no 
interior  in  3?"  because  they  lie  on  some  surface  of  lower  dimensionality.  The  main 
task  remaining  in  this  chapter  is  therefore  to  make  more  precise  the  naturality  of  the 
finiteness  requirement.  The  explanation  just  given  provides  the  basic  outline  of  the 
argument. 


Forward  Projections 

In  order  to  motivate  the  insistence  on  coverage  by  interiors  of  preimages,  we  will 
consider  the  forward  projection  of  a  point  moving  with  a  commanded  velocity  that  is 
subject  to  non-zero  error.  We  defined  the  forward  projection  in  the  discrete  setting  on 
pages  94  and  102.  In  the  continuous  setting  we  need  a  time  index  as  well,  since  actions 
are  executed  over  some  interval  of  time.  Thus  let  us  define  the  forward  projection  at 
time  t ,  Fv.4  (/?),  to  be  the  set  of  configurations  that  the  system  might  be  in  at  time 
t,  given  that  it  started  out  in  the  region  R  at  time  zero,  and  moved  during  the  time 
interval  [0,  f]  with  commanded  velocity  Vo,  subject  to  control  uncertainty  as  defined 
earlier.  This  notation  differs  slightly  from  that  used  in  [Erd86].  If  one  is  not  interested 
in  any  particular  time,  then  one  may  consider  the  timeless  forward  projection 

3Of  course,  ignoring  possible  boundary  effects,  pc(x)  is  now  non-zero  for  all  x,  since  there  is  an 
interval  of  preimages  that  contain  i,  so  in  fact  the  guessing  strategy  would  succeed  even  in  the  face 
of  a  worst-case  adversary.  Notice,  however,  that  in  order  to  avoid  zero  probabilities  of  success  at 
the  boundaries  one  has  to  almost  unnaturally  construct  preimages  that  extend  beyond  the  goal. 
Alternatively,  one  could  only  consider  preimages  R0  with  o  in  the  range  A  =  [0, 1  —  f],  then  insist 
that  have  positive  measure  on  the  atoms  given  by  the  endpoints  0  and  1  —  £,  as  well  as  non- 
uniform  measure  near  these  endpoints.  This  is  equivalent  to  constructing  additional  preimages  of 
width  less  than  t  near  the  endpoints. 


4.2.  GUESSING  STRATEGIES 


209 


Fvo(R)  =  U  Fv0 .,(*). 

(>° 

For  future  reference,  let  us  also  recall  the  definition  of  a  backprojection  from 
[Erd86].  In  particular,  the  backprojection  BVo(G)  of  some  region  G  is  the  set  of 
all  configurations  from  which  the  system  is  guaranteed  to  pass  through  the  set  G  at 
some  time,  despite  uncertainty,  given  that  the  commanded  velocity  is  v0.  Effectively, 
the  forward  projections  encode  the  historical  information  available  to  the  termination 
predicate,  while  the  ba^kprojections  encode  the  reachability.  See  [Erd86]  for  further 
details. 

For  the  sake  of  having  a  focused  discussion,  we  will  assume  that  the  termination 
predicate  is  of  the  type  discussed  in  the  LMT  work.  In  particular,  in  addition 
to  the  commanded  velocity,  the  various  uncertainty  parameters,  and  a  description 
of  the  environment,  the  termination  predicate  is  given  the  following  information: 
Initially,  the  termination  predicate  is  given  the  start  region.  Thereafter,  at  every 
time  t  >  0,  the  termination  predicate  is  given  the  current  time,  and  the  current 
sensory  information.  Of  course,  a  particular  termination  predicate  may  only  consider 
some  of  this  information.  The  most  powerful  termination  predicate,  discussed  in 
[Mas84l,  remembers  all  information  given  to  it.  It  is  assumed  that  the  termination 
predicate  can  compute  forward  projections  of  any  set  for  any  time,  as  well  as  form 
arbitrary  unions  and  intersections  of  these  sets  with  themselves  and  with  sensory 
interpretation  sets.4  A  termination  predicate  signals  goal  attainment  when  its  current 
knowledge  state  is  inside  a  goal.  For  instance,  consider  a  termination  predicate  that 
remembers  the  start  region,  and  considers  the  current  sensory  information,  but  forgets 
past  sensory  information  and  does  not  look  at  the  current  time.  If  R  is  a  preimage 
of  the  goals  {Go}  relative  to  a  commanded  velocity  v0,  this  predicate  will  signal 
goal  attainment  when  the  set  ^vo  (/?)n5(,(x‘)  is  inside  some  goal  Gg.  Here  x’  is  the 
current  sensed  position  and  f?e>(x*)  is  its  interpretation  set.  More  general  descriptions 
exist  for  more  general  sensors. 

Now5  suppose  that  whenever  velocity  v0  is  commanded  the  actual  velocity  lies  in 
some  error  ball  B((v0).  Note  that  e  may  depend  on  v0.  To  avoid  trivialities  let  us 
assume  that  c  <  |v0|.  If  x  6  3?"  lies  in  free  space  and  t  is  non- zero  but  small  enough 
so  that  the  forward  projection  of  x  lies  in  free  space,  then  we  have  that 


^v0,t({x})  =  Bu(x  +  tv0). 


In  other  words,  for  all  non-zero  times  the  forward  projection  of  a  point  in  free  space 
is  some  open  ball.  Since  FvoAR)  =  Ue/^vo,  t({x}),  this  says  that  for  all  non-zero 
times  the  forward  projection  of  a  set  R  in  free  space  is  an  open  set.  Now  consider  the 
backprojection  of  some  goal  G,  relative  to  a  commanded  velocity  v0  and  suppose  that 

4The  term  “compute”  is  used  in  a  non-technical  sense,  that  is,  it  is  set-theoretic.  Indeed,  many 
sets  are  not  computable  in  the  technical  sense  of  computability  theory.  See  [CR]  and  [CanbCj  I hi 
some  resuits  on  the  computability  and  complexity  of  forward  projections.  See  also  [Erd84]. 

5Recall  that  Br( p)  refers  to  the  open  ball  of  radius  r  about  the  point  p  6 
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Figure  4.6:  A  two-dimensional  friction  cone.  Also  shown  is  the  computation  of  a  net 
force  given  an  applied  force. 


this  backprojection  lies  wholly  in  free  space.  By  construction,  any  preimage  R  under 
velocity  v0  of  G  must  lie  in  this  backprojection.  Suppose  that  R  C  BVo(G ),  and  that 
t  is  a  non-zero  time  at  which  the  system  cannot  yet  have  encountered  the  goal.  Then, 
cvc’"  :f  R  itself  cciitedns  no  interior,  the  set  FVOit(R)  is  open  and  all  points  in  it  are 
guaranteed  to  pass  through  the  goal  eventually.  Of  course,  in  general,  the  set  FVo ,<(/?) 
need  not  be  a  preimage  of  G  even  if  R  is.  However,  for  special  cases,  for  instance 
if  the  termination  predicate  only  uses  the  timeless  forward  projection  and  current 
sensed  values  in  determining  goal  attainment,  and  if  G  is  closed,  then  this  forward 
projection  is  indeed  a  preimage  of  G  (see  [Erd86]).  Thus  there  is  strong  motivation 
for  considering  only  preimages  with  non-empty  interior  in  guessing  starting  regions. 

Collisions  and  Friction 

In  order  to  account  for  contact  with  obstacles,  consider  how  the  velocity  error  ball 
is  modified  by  collisions  with  lower-dimensional  surfaces  in  3?n.  We  will  assume  that 
all  surfaces  are  piecewise  linear,  as  is  reasonable  for  polyhedral  obstacles,  and  that 
friction  is  isotropic  and  invariant  across  any  such  planar  patch.  In  particular,  for 
hvperplanes  of  dimension  n  —  1  friction  is  described  by  an  n-dimensional  cone  with 
cone  angle  a  =  arctan/i,  where  p  is  the  coefficient  of  friction.6  The  axis  of  this 
cone  is  the  normal  to  the  hyperplane.  See  figure  4.6  for  the  two-dimensional  case  in 
3ft2.  The  effect  of  friction  on  the  intersection  of  several  such  surfaces  is  determined 
by  the  generalized  damper  analogue  of  Newton’s  equations.  In  practice,  we  are 
thinking  of  9?2  and  S?3.  The  description  of  friction  in  configuration  spaces  involving 
object  rotations  is  slightly  more  complicated.  In  particular,  the  effective  friction  in 
configuration  space  may  vary  from  configuration  to  configuration.  Also,  friction  need 
not  appear  for  certain  tangential  motions,  such  as  those  involving  pure  rotations. 
[See  [Erd84].]  For  the  case  9?2,  the  computation  of  an  effective  motion  is  determined 


6  We  will  assume  that  static  and  sliding  tnction  are  equal. 
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by  projecting  the  applied  force  onto  the  friction  cone  as  indicated  in  figure  4.6.  In 
particular,  if  the  negative  applied  force  — lies  inside  the  friction  cone  then  there 
is  no  resulting  motion.  Otherwise,  the  net  force  is  given  by  +  Fr,  where  F r  is 
a  reaction  force  on  the  edge  of  the  friction  cone,  whose  normal  component  directly 
cancels  the  normal  component  of  F^.  For  generalized  damper  dynamics,  forces  and 
velocities  are  equivalent,  in  the  sense  that  the  applied  force  is  the  term  B  v0  in  the 
equation  F  =  B  (v  —  v0). 

More  generally,  if  contact  exists  on  some  plane  P  of  dimension  n  —  k  in  3Rn,  then 
given  an  applied  force  F,t,  an  effective  motion  is  computed  as  follows.  One  can  think 
of  the  plane  of  dimension  n  —k  as  being  the  intersection  of  k  independent  hyperplanes 
of  dimension  n  —  1.  To  say  that  the  system  is  in  contact  with  the  plane  P  is  to  say 
that  it  is  in  contact  with  each  of  the  individual  hyperplanes.  Let  the  outward  unit 
normals  to  these  hyperplanes  be  given  by  the  vectors  nj,..  .  n*.7  Then  the  friction 
cone  at  the  ith  point  of  contact  is  given  by  all  forces  that  form  an  angle  with  the 
outward  normal  n,  that  is  no  greater  than  a  =  arctan  n .  In  other  words,  the  friction 
cone  is  the  set  of  forces 


T,= 


F  n,  > 


|F|  1 

v^l  +  H2  / 


Said  differently,  the  set  of  reaction  forces  that  can  be  generated  by  the  ith  hyperplane 
is  given  by  the  set  Tx.  The  composite  friction  cone  due  to  contact  with  the  plane  P 
is  simply  the  vector  sum  of  the  individual  friction  cones.  In  other  words,  the  set  of 
possible  reaction  forces  is  the  set  Tp ,  where 


Tp  = 


F  =  F,,  with  F,  e 


i=i 


Given  an  applied  force  F4,  there  are  two  possibilities.  Either  the  system  moves  or 
it  sticks.  Consider  the  case  in  which  it  sticks.  Then  we  must  have  that  the  reaction 
force  Fr  is  of  the  form  Fr  =  — F^.  In  other  words,  — F4  6  Tp.  Conversely,  if 
— F4  6  Tpx  then  a  possible  motion  solution  is  given  by  sticking,  with  reaction  force 
Fr  =  -F4. 

We  will  consider  the  other  possibility,  in  which  the  system  moves,  presently.  First, 
let  us  observe  that  in  general  the  effective  contact  may  not  involve  actual  contact  with 
all  the  hyperplanes  that  define  the  plane  P .  For  instance,  if  the  applied  force  points 
away  from  each  of  these  planes,  that  is,  if  F4  •  n,  >  0,  for  ail  i  =  l,...,jfc,  then 
the  system  is  effectively  not  in  contact  with  any  of  the  hyperplanes.  Thus  a  possible 
reaction  force  would  be  Fr  =  0,  meaning  that  the  motion  of  the  system  would  be 
through  free  space,  along  the  direction  specified  by  F*. 

In  general,  any  subset  of  the  k  hyperplanes  defining  the  contact  with  P  might 
actually  constitute  the  effective  contact.  Any  such  smaller  contact  set  defines  a  higher¬ 
dimensional  plane  of  contact.  In  principle  one  therefore  needs  to  recursively  check  all 

7In  general,  we  should  also  allow  redundant  constraints  and  perhaps  different  coefficients  of 
friction  on  the  different  hypersurfaces.  The  discussion  may  be  generalized  to  include  these. 
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these  smaller  contact  sets  for  possible  reaction  forces.  For  each  of  these  one  computes 
the  net  resulting  motion,  in  the  manner  to  be  outlined  presently.  If  such  a  net  motion 
is  consistent  with  all  the  kinematic  constraints,  then  it  is  a  feasible  solution  to  the 
motion  problem. 

In  the  most  general  case,  there  may  be  several  solutions  consistent  with  the  applied 
force  Fa-  This  is  particularly  true  when  in  contact  with  low-dimensional  surfaces.  The 
resulting  motion  may  depend  on  one’s  interpretation  of  these  contacts.  For  instance, 
in  two  dimensions,  contact  with  a  convex  vertex  may  be  thought  of  in  one  of  three 
ways:  contact  with  the  edge  on  one  side  of  the  vertex,  contact  with  the  edge  on 
the  other  side  of  the  vertex,  or  simultaneous  contact  with  both  edges.  Thus  several 
possible  contact  states  may  be  consistent  with  Newton’s  equations  and  Coulomb 
friction  (or  their  analogues  under  generalized  damper  dynamics).  This  ambiguity 
may  introduce  further  non-determinism  into  the  system’s  behavior. 

Let  us  assume  that  the  effective  contact  is  given  by  all  the  k  hyperplanes  that 
define  the  plane  P.  Now  consider  the  possibility  that  the  system  moves.  We  can 
write  the  applied  force  as  =  Fn  +  F(,  where  F„  lies  in  the  normal  space  spanned 
by  the  k  normals,  and  F(  is  parallel  to  the  plane  P.  Similarly,  since  contact  with  P 
is  maintained,  observe  that  the  reaction  force  is  of  the  form  Fr  =  — Fn  —  pt,  where 
g  >  0  and  t  is  some  unit  vector  parallel  to  the  plane  P.  We  will  see  shortly  that 
t  is  positively  parallel  to  the  tangential  component  Ft  of  the  applied  force.  In  any 
event,  the  net  force  is  of  the  form  Fnet  =  Ft  —  gt.  If  this  vector  is  non-zero,  then  it 
specifies  the  direction  of  motion,  in  the  plane  P. 

Since  the  system  is  moving,  and  in  contact  with  each  of  the  hyperplanes,  the 
isotropy  assumption  implies  that  the  reaction  force  at  each  of  the  contributing 
hyperplanes  must  lie  on  the  edge  of  its  respective  friction  cone.8  Furthermore,  each 
reaction  force  must  oppose  the  direction  of  motion.  This  says  that  the  tangential 
component  of  the  reaction  force  at  each  of  the  k  hvperplanes  is  actually  anti-parallel 
to  the  vector  Fnet-  fn  turn,  this  means  that  t  is  positively  parallel  to  Fnet>  and 
hence  to  F<.  We  see  therefore  that  the  frictional  part  of  the  reaction  forces  does 
not  contribute  to  the  maintenance  of  contact  with  the  plane  P,  but  merely  to  the 

reduction  of  tangential  motion.  By  construction  we  have  that  F„  =  ci  nH - bck  nit, 

for  some  set  of  constants  {c^}.  It  follows  that  each  of  the  ct  in  the  description  of  F/i 
must  be  zero  or  negative,  for  otherwise  the  normal  reaction  forces  at  the  points  of 
contact  would  be  negative,  a  physical  impossibility.  The  scalar  g  thus  may  be  written 
as  the  sum  g\  + - j-  git,  where  friction  dictates  that  0  <  g,  <  -pc,,  for  all  i. 

In  short,  for  contact  with  the  plane  P  under  applied  force  FA,  there  are  two 
possibilities  that  do  not  involve  the  breaking  of  contact.  First,  the  negative  applied 
force  may  lie  inside  the  composite  friction  cone  fp,  in  which  case  the  resulting  motion 
may  be  zero.  Of  course,  under  certain  indeterminacies  a  tangential  motion  may  be 
possible  as  well,  for  instance,  if  we  permit  a  set  of  dependent  normals,  that  is,  a 

sThis  need  not  be  true  in  general  configuration  spaces,  such  as  those  involving  rotations,  or 
multiple  moving  objects  that  do  not  interact.  In  those  spaces  the  isotropy  assumption  does  not 
hold.  Generalizations  of  the  procedure  for  computing  net  motions  apply,  although  the  specific 
conclusions  need  not. 
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set  of  redundant  hyperplanes,  along  with  different  coefficients  of  friction  on  each  of 
the  hyperplanes.  Second,  if  the  negative  applied  force  lies  outside  of  the  composite 
friction  cone,  and  contact  is  not  broken,  then  a  tangential  motion  must  result.  If 
a  tangential  motion  does  occur,  then  the  tangential  reaction  force  has  magnitude 
g  =  — /i  (ci  +  •  ■  •<:*),  with  c,  <  0  for  all  i.  The  resulting  motion  is  determined  by  the 
net  force  (|F(|  —  g)t,  which  points  in  the  same  tangential  direction  as  the  applied 
force,  but  has  a  smaller  magnitude.  [Note  that  this  only  makes  sense  if  g  <  |Ft  | .] 

Forward  Projections  on  Surfaces 

The  discussion  of  the  last  few  paragraphs  is  intended  partly  as  a  quick  review  of 
friction.  The  main  purpose,  however,  is  to  indicate  that  the  forward  projection  of  a 
point  on  a  surface,  by  those  velocities  in  the  velocity  error  ball  that  maintain  contact 
with  the  surface,  forms  a  set  that  is  open  in  the  relative  topology  of  the  surface. 
In  other  words,  suppose  x  is  some  point  on  a  plane  P  of  dimension  n  —  k  as  above. 
Consider  applying  a  nominal  commanded  velocity  v0  subject  to  the  usual  uncertainty- 
considerations.  There  are  two  possibilities.  Either  the  point  moves  away  from  the 
surface,  or  it  maintains  contact.  Of  course,  in  some  cases,  over  time  the  point  may¬ 
be  able  to  do  both,  and  intermittently  hop  back  and  forth  between  free  space  and 
contact  space,  and  perhaps  between  surfaces  of  different  dimensionality. 

Suppose  the  commanded  velocity  is  v0  and  that  all  effective  commanded  velocities 
lie  in  the  open  ball  Bt(v0).  Suppose  further  that  the  system  is  in  contact  with  a 
plane  P  of  dimension  n  -  k,  formed  by  the  intersection  of  k  hyperplanes.  Let  the 
independent  unit  normals  of  the  defining  k  hyperplanes  be  given  by  rij , . . . ,  n*.  Then 
any  vector  can  be  written  as  v  =  c,n,  -f  h  t,  where  the  {c,}  and  h  are  scalars  and 
t  is  some  unit  tangent  vector  parallel  to  the  plane  P.  Assuming  generalized  damper 
dynamics  with  an  identity  damping  matrix  B,  we  will  think  of  forces  and  velocities  as 
equivalent.  The  set  of  velocities  in  £?<(v0)  that  can  maintain  contact  with  the  plane 
P  is  given  by 


B contact  —  A  V  ^ 


v  =  ]Pc,n,  +  /it, for  some  h  and  t,  with  c,  <  0  for  all  i 

i=i 


U  jv  <=  B'(vo) 


-  v  e  Tp 


The  set  of  all  velocities  that  must  break  contact  is  given  by  £?bre«k  =  Be(vo)  -  /^contact- 
By  breaking  contact  we  mean  simply  that  the  instantaneous  contact  may  be  thought 
to  occur  either  in  free  space  or  on  a  plane  of  higher  dimension.  Contact  state  may 
also  be  changed  by  other  means,  say  by  sliding  off  the  boundary  of  the  plane  and  into 
free  space.  That  may  be  viewed  as  a  change  of  state,  in  that  the  system  is  in  contact 
at  some  time  t  =  t0,  but  is  in  free  space  at  time  t  >  to- 

Finally,  the  set  of  velocities  that  can  brea1,  contact  completely  is  given  by 
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Bfm  =  {v  £  Be(v0)  j  v  •  n,  >  0  ,i  = 

These  are  all  velocities  for  which  the  system  could  move  into  free  space. 

The  set  Bcontact  is  relatively  closed  in  the  ball  B((v0),  so  the  set  2?bre*k  is  open. 
More  to  the  point,  we  see  that  the  set  of  velocities  that  can  break  contact  completely, 
that  is,  the  set  B^,  is  an  open  set.  Thus,  by  an  argument  similar  to  the  one  given 
above,  the  portion  of  the  forward  projection  that  arises  solely  from  velocities  that 
move  through  free  space  is  an  open  subset  of  free  space. 

Let  us  focus  now  on  the  contact  with  the  plane  P,  and  show  that  the  forward 
projection  of  a  point  on  this  plane,  by  velocities  in  the  error  ball  that  can  maintain 
contact  with  the  plane,  contains  an  interior  in  the  relative  topology  of  the  plane. 
Given  an  applied  force  or  velocity  v,  let  us  denote  by  7r(v)  the  resulting  net  force  or 
velocity,  assuming  generalized  damper  dynamics  and  contact  with  P.  For  some  v. 
tt( v)  will  just  be  zero,  that  is  no  motion  will  result.  We  would  like  to  show  that  the 
set  of  net  velocities,  given  by  ir(  Bcontacl)  is  a  set  with  interior  (relative  to  the  topology 
of  In  fact,  we  will  show  that  almost  all  resulting  velocities  are  interior  to 

the  set  x( Bconlacl).  The  only  exception  will  be  in  some  cases  the  zero  velocity.  This 
implies  that  if  one  only  considers  non-sticking  contact  velocities,  then  the  forward 
projection  of  any  point  will  be  an  open  set  in  the  relative  topology  of  the  contact 
plane.  The  argument  is  the  same  as  for  the  free  space  case  except  that  now  the 
velocity  error  ball  is  replaced  bv  some  other  open  set  of  possible  velocities.  If  there 
are  sticking  velocities  in  the  projected  velocity  error  bail  then  the  forward  projection 
of  a  region  R  will  be  the  union  of  some  relatively  open  set  determined  by  the  non 
sticking  velocities  and  the  region  R  itself.  Thus  the  forward  projection  contains  an 
interior.  More  importantly,  if  one  is  only  interested  in  preimages,  then  there  can  be 
no  sticking  velocities,  as  otherwise  one  could  not  guarantee  goal  attainment,  so  the 
forward  projection  with  respect  to  contact  velocities  is  a  relatively  open  set. 

Let  us  therefore  consider  those  velocities  for  which  the  resulting  motion  is  not 
zero.  By  the  discussion  above,  we  can  write  the  effect  of  7 r  on  such  a  velocity  as: 

k  k 

*  (£  c,  n,  +  h  t)  =  (/i  +  n  ]T  c,)  t. 

i=i  i=i 

Now  suppose  v  =  /it,  that  is,  suppose  all  of  the  c,  are  zero.  Then  tt(v)  =  v,  that 
is.  7r  restricted  to  velocities  with  no  normal  component  is  just  the  identity  map.  More 
generally,  if  one  fixes  the  constants  {c,}  at  some  set  of  values,  then  one  can  think  of  7r 
as  a  self-mapping  between  tangent  vectors  in  the  plane  of  contact.  The  mapping  is  a 
form  of  shifting  given  by  7r^Ci)(/it)  —  («  +  c)t,  with  c  =  /i  ^f=1  c,,  and  h  >  0.  Clearly 
7T{c,',  *s  well-defined  for  all  non-zero  tangent  vectors.  However,  the  assumption  that 
the  applied  velocity  results  in  motion  means  that  we  are  only  applying  7r  to  vectors 
for  which  h  >  -c  >  0.  Let  us  define  two  sets  of  tangent  vectors  in  the  plane  of 
contact.  Let  Vo  be  the  set  of  all  tangent  vectors  which  can  be  written  in  the  form 
hi  with  h  >  0,  and  let  Vc  be  the  set  of  all  tangent  vectors  hi  with  h  >  — c.  Then 
7T{Ctj  is  a  one  to  one  mapping  of  Vc  onto  Vq.  Thus  7T{Cij  possesses  a  two-sided  inverse. 
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mapping  Vo  onto  Vc.  The  inverse  is  given  by  t)  =  (l  —  c)  t,  for  all  unit  tangent 

vectors  t  and  scalars  £  >  0.  It  is  clear  that  this  inverse  is  a  continuous  function,  and 
so  we  see  that  7T{C,}  is  an  open  map  from  Vc  to  V0. 

Now  fix  a  particular  non-zero  image  vector  of  x  applied  to  Bcootacl.  This  vector 
is  of  the  form  7r(v),  for  some  vector  v  =  £?=i  c,  n,  +  ht  in  the  set  Bcootaci ,  with 
h  >  — c  =  —  c,  >  0.  Consider  the  set  of  velocities 


O  = 


w  €  Bc 


k 

w  =  £<f,  n<  +  v<>  \di  —  Ci\<  6  and  |v,  —  hi\  < 

•=i 


Here  S  is  some  small  positive  number,  and  v(  is  any  tangent  vector  parallel  to 
the  contact  plane  P.  The  set  O  is  an  open  neighborhood  in  Bcontact  of  the 
velocity  v.  If  8  is  chosen  small  enough  then  one  can  guarantee  that  all  the 
vectors  in  the  right  part  of  the  set  definition  actually  lie  inside  the  velocity  error 
ball  since  it  is  an  open  ball.  This  says  that  O  wholly  contains  the  set 

Oc  =  {w  j  w  =  c,  n,  +  v(,  with  jv,  —  /it|  <  Note  that  in  this  last  set 

the  normal  components  are  all  the  same,  determined  by  the  {c,}  that  define  v. 
V  iewing  this  set  as  a  subset  of  the  tangent  vectors  to  the  contact  plane  P,  we  see  that 
it  is  relatively  open  and  a  subset  of  Vc.  But  this  says  that  the  image  7T{C, j ( Oc)  is  an 
open  neighborhood  of  7r(v),  and  thus  n(Oc)  is  a  neighborhood  of  x(v).  This  shows 
that  n '{Bconlacl)  -  {0}  is  an  open  set  in  the  topology  of  $>n~k. 


Contact  Changes 

Finally,  suppose  that  we  model  obstacles  as  closed  sets.  Additionally,  each  plane 
of  any  dimension  is  a  closed  set.  Now  consider  the  possible  contact  changes  for  a 
portion  of  the  forward  projection  that  is  an  open  set  relative  to  its  contact  state.  For 
instance,  consider  an  open  ball  of  dimension  n  —  k  on  a  plane  of  dimension  n  —  k,  and 
consider  its  collision  with  a  subplane  of  dimension  n  —  k  —  t.  The  intersection  with 
the  lower-dimensional  plane  is  necessarily  relatively  open  in  that  plane.  Conversely, 
suppose  that  at  time  t  —  t0  an  open  ball  of  dimension  n  —  k  —  i  prepares  to  lift  off 
from  from  the  subplane  of  dimension  n  —  k  —  t,  moving  off  into  the  containing  plane 
of  dimension  n  —  k  for  all  times  t  >  t0.  Given  only  velocities  for  which  it  is  possible 
to  move  on  the  plane  of  dimension  n  —  k,  the  arguments  above  show  that  this  ball 
forward  projects  into  an  open  set  in  the  relative  topology  of  the  containing  plane  for 
all  times  t  >  <0- 


Brief  Summary 

In  short,  we  have  shown  that  the  forward  projection  of  a  set  relative  to  an  open 
velocity  uncertainty  ball  contains  interior  relative  to  each  of  the  contact  states  it 
defines.  The  argument  above  is  not  a  formal  proof,  but  it  does  provide  some  intuition 
and  some  motivation  for  insisting  that  the  operator  SELECT  only  guess  between  finite 
collections  of  preimages. 
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Compactness  Argument 

We  are  now  in  a  position  to  state  the  compactness  argument  more  generally.  First, 
let  us  write  the  reachable  state  space  X  in  the  form 

x  =  a'bU"-U*»U*o, 

where  A,  is  the  closure  of  the  set  of  all  points  that  lie  on  a  plane  of  dimension  i.  This 
means  that  Kn  is  the  closure  of  the  set  of  all  points  in  X  that  lie  in  free  space,  while 
A'0  is  the  set  of  all  vertices  of  the  obstacle  polyhedra.  We  assume  that  all  polyhedra 
have  full  dimensionality,  that  is,  are  formed  by  sets  of  hyperplanes  of  dimension  n  —  1 . 
Then  we  see  that  A'„  D  •  ■  •  D  K\  D  Kq. 

If  we  are  given  a  region  R  C  X,  we  can  thus  form  the  unique  union  R  = 
AflU  ’-UAo,  where  A,  =  Rf]K,.  Similarly,  given  a  collection  of  preimages  {A#} 
that  cover  A.  we  can  form  the  collections  {A^,,,},...  {A^o},  where  A^,,  = 
for  all  ft  and  i.  Notice  that  each  Rgx  is  a  preimage  since  the  subset  of  a  preimage 
is  always  also  a  preimage.  Clearly,  each  collection  {A^,,}  covers  the  dimensionally 
corresponding  subset  A,  of  A.  If  we  assume  that  the  set  A  is  compact  to  begin 
with,  and  that  the  preimages  {A#,,}  are  open  in  the  relative  topology  of  A',,  then  in 
fact  a  finite  number  of  these  preimages  will  cover  A,.  Thus  SELECT  can  naturally 
choose  between  a  finite  set  of  preimages.  Actually,  we  can  further  loosen  the  openness 
requirement  on  the  preimages,  and  merely  ask  that  each  of  the  sets  A,  f)  Rg.i  be  open 
in  the  relative  topology  of  A,.  This  permits  the  preimages  to  contain  some  extra  limit 
points.9 

Preimages  and  Forward  Projections 

We  have  tried  to  motivate  the  discussion  of  open  preimages  or  preimages  with  interior 
by  showing  that  the  forward  projection  naturally  contains  interior  in  each  dimension, 
for  tasks  in  9?"  that  involve  polyhedral  obstacles  with  simple  friction,  and  that  use 
generalized  damper  dynamics  subject  to  non-vanishing  control  uncertainty.  If  one 
therefore  insists  that  preimages  at  least  contain  their  forward  projections  for  a  small 
period  of  time,  then  one  can  guarantee  that  preimages  contain  interior  as  well. 

Clearly  there  will  still  be  some  problems  for  which  infinite  coverage  by  preimagjs 
without  interior  is  unavoidable.  In  such  cases,  if  the  unrestricted  version  of  SELECT  is 
to  function  properly,  one  must  satisfy  one  of  the  conditions  (4.2)  or  (4.3).  In  general, 
this  will  entail  looking  at  each  particular  guessing  step  individually,  then  determining 
an  appropriate  index  set  A  and  guessing  distribution  However,  for  many  problems 
one  may  restrict  SELECT  to  finite  decisions. 

Let  us  briefly  also  indicate  why  it  is  reasonable  to  insist  that  preimages  contain 
part  of  their  forward  projection.  For  special  cases,  as  we  have  noted,  it  is  almost 
automatic.  More  generally,  the  argument  is  very  similar  to  the  one  used  to  establish 
the  openness  of  the  forward  projection.  Consider  a  preimage  A  and  its  forward 

9For  instance  the  semi-open  interval  [0,-|)  is  open  in  the  relative  topology  of  the  closed  interval 

[0, 1], 
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projection  ^V0,  t(R)  at  some  time  t  >  0.  We  claim  that  if  t  is  small  enough,  then 
a  relatively  open  subset  of  this  forward  projection  is  itself  a  preimage.  This  does 
not  quite  say  that  R  contains  any  interior,  but  it  does  say  that  there  is  a  preimage 
naturally  derivable  from  R  that  does  contain  interior  (in  fact  is  open).  In  order  to 
make  the  argument  we  must  make  two  further  assumptions:  (1)  That  the  sensing 
uncertainty  is  a  non-degenerate  open  ball,  and  (2)  that  there  is  some  minimum  time 
tmin  >  0  before  which  goal  attainment  and  recognition  is  impossible  from  the  preimage 
R.  One  can  probably  remove  this  second  assumption,  but  we  will  not  worry  about 
that  here.  Additionally,  we  will  focus  on  the  case  in  which  the  forward  projection  lies 
in  free  space  and  in  which  the  only  sensor  is  a  position  sensor. 

In  order  to  be  concrete,  let  us  suppose  that  R  is  a  preimage  of  some  collection  of 
goals  {Gp},  relative  to  the  commanded  velocity  v0  and  some  termination  predicate. 
Suppose  that  the  control  uncertainty  ball  has  radius  e  =  e(v0),  while  the  position 
sensing  uncertainty  ball  has  radius  ca.  Choose  t0  >  0  to  be  smaller  than  both  t^n 
and  ^  In  other  words,  in  the  time  t0 ,  the  furthest  any  point  can  move  is  half 

the  radius  of  the  sensing  uncertainty  ball.  Now  consider  any  subset  Ro  of  R  whose 
diameter  is  less  than  t,/2.  Let  F  =  FV0,t0(Ro).  which  is  a  relatively  open  subset  of 
Fy0't0(R).  We  would  like  to  establish  that  F  is  a  preimage  of  the  collection  {G75}, 
relative  to  the  commanded  velocity  v0  and  the  same  termination  predicate  as  that 
used  for  the  preimage  R. 

In  der  to  establish  that  F  is  a  preimage  we  must  show  that  any  trajectory 
starting  in  this  set  is  guaranteed  to  terminate  recognizably  inside  a  goal.  First,  let  as 
note  that  all  trajectories  emanating  from  F  must  pass  through  a  goal  by  the  definition 
of  R  and  n.  Next.10  consider  a  motion  starting  in  F  at  time  t'  =  0.  Let  us  determine 
the  information  available  to  the  termination  predicate  at  time  t'  =  0.  In  line  with 
the  discussion  on  page  209.  the  termination  predicate  is  given  the  start  region  F,  the 
time  t'  =  0,  and  whatever  sensed  value  x'  is  returned  by  the  sensor  at  time  t'  =  0. 
In  general,  x*  can  be  any  sensory  value  consistent  with  a  starting  position  in  F.  Of 
course,  it  is  possible  that  the  particular  termination  predicate  employed  will  ignore 
some  or  all  of  this  information.  Let  us  denote  by  A'x*  the  knowledge  state  derived  by 
the  termination  predicate  from  the  information  it  is  given  at  time  t'  =  0. 

Since  R  is  a  preimage,  so  is  Rq.  Consider  therefore  a  motion  emanating  from 
the  set  Rq.  Fix  some  Xo  €  Rq.  Given  an  adversarial  sensor,  we  may  assume  that 
the  sensory  value  returned  for  all  times  t  in  the  range  [0,  to)  is  xq.  By  construction, 
the  forward  projection  of  Rq  at  any  such  time  is  contained  inside  the  sensing  er*or 
ball  about  Xo,  that  is,  FVo,«(Fo)  C  B(i(xq)  for  all  t  £  [0,  <0]-  In  short,  the  sensors 
contribute  nothing  over  that  time  interval  to  the  termination  predicate’s  decision. 
Thus  the  knowledge  state  K  available  to  the  termination  predicate  at  time  t  =  t0 
(before  sensing)  is  some  superset  of  F,  and  possibly  equal  to  F.  Since  the  motion 
starting  from  R0  at  time  t  =  0  is  guaranteed  to  terminate  recognizably  in  a  goal,  the 
motion  starting  from  F  at  time  t  =  t0  with  knowledge  state  K  and  sensor  value  x* 


l0We  use  the  notation  l'  to  indicate  that  this  time  is  not  directly  related  to  the  clock  used  in 
executing  a  motion  from  the  preimage  R. 


218 


CHAPTER  4.  PREIMAGES 


Figure  4.7:  This  figure  shows  a  typical  backprojection  of  a  two-dimensional  disk. 


must  terminate  recognizably  in  a  goal.  Denote  by  K *  the  knowledge  state  formed 
from  K  and  z*.  Now  consider  again  the  termination  predicate  that  starts  a  motion  in 
the  set  F  at  time  t'  =  0.  Recall  that  the  knowledge  state  available  to  this  termination 
predicate  is  A'x..  It  is  reasonable  to  assume  that  the  knowledge  state  Km  is  a  superset 
of  the  knowledge  state  A'x.,  which  establishes  that  F  is  a  preimage.  11 
One  observes  that  a  similar  argument  shows  that  the  set 

U 

0«<<o 

is  also  a  preimage  that  is  an  open  subset  of  the  forward  projection  of  R,  assuming  that 
the  termination  predicate  does  not  consider  time.  More  general  forward  projections 
are  preimages  if  the  termination  predicate  only  considers  current  sensed  values. 

Finite  Guesses 

A  final  comment  should  be  made.  We  have  thus  far  indicated  the  existence  of 
preimages  with  interior.  We  have  not  yet  motivated  the  naturality  of  insisting  that 
open  preimages  cover  a  compact  guessing  region.  This  condition  is  desirable  since 
it  ensures  guessing  finiteness.  However,  there  are  many  cases  in  which  the  guessing 

"One  can  imagine  termination  predicates  that  randomly  choose  starting  knowledge  states  that 
include  the  actual  starting  region,  but  we  will  exclude  those.  Most  likely  Km  and  Kx •  are  actually 
equal.  This  is  certainly  true  for  most  of  the  variations  of  termination  predicates  discussed  in  [Erd86]. 
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Figure  4.8:  The  union  of  all  possible  backprojections  of  the  disk  of  figure  4.7  is  a  disk 
of  radius  4r.  This  figure  shows  how  to  split  the  disk  into  a  finite  number  of  regions. 
A  guessing  strategy  that  cannot  sense  the  position  of  the  system  can  thus  guess  the 
correct  region  with  non-zero  probability.  If  the  system  guesses  that  it  is  in  the  outer 
ring  of  width  <5,  then  it  moves  inward  as  indicated  by  the  arrows.  Otherwise,  the 
system  guesses  that  it  is  in  some  backprojection,  one  of  which  is  shown. 
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region  is  open  rather  than  closed.  We  show  now,  by  example,  that  this  poses  no 
serious  problem. 

Consider  the  task  of  attaining  an  open  disk  in  the  plane.  Assume  that  the  velocity 
uncertainty  is  given  by  a  ball  with  radius  e  =  i  |v0|,  where  v0  is  the  commanded 
velocity,  as  usual.  If  the  disk  has  radius  r,  this  says  that  one  can  backproject 
along  any  direction,  and  obtain  a  cone  with  apex  at  distance  4 r  from  the  center 
of  the  disk.  See  figure  4.7.  Suppose  that  the  disk  is  recognizable  once  entered. 
Then  each  of  these  backprojections  is  actually  a  preimage,  relative  to  a  termination 
predicate  that  only  checks  for  disk  attainment.  This  says  that  there  is  a  collection 
of  preimages  whose  interiors  cover  the  open  ball  B4t  of  radius  4r  centered  at  the 
center  of  the  disk.  However,  the  interiors  of  the  preimages  do  not  cover  the  closed 
ball  of  radius  4r.  Thus,  if  the  initial  position  is  known  to  lie  in  B4r  the  finite  version 
of  Select  does  not  apply,  that  is,  there  is  no  finite  collection  of  preimages  that 
covers  B4r.  We  note  in  passing  that  this  need  not  be  a  problem  if  the  starting 
position  is  probabilistically  distributed  and  constraint  (4.3)  holds.  In  other  words,  for 
some  probabilistic  distributions,  SELECT  can  successfully  choose  between  the  infinite 
collection  of  preimages,  by,  for  instance,  guessing  the  angle  of  approach.  However,  in 
the  non-deterministic  setting,  constraint  (4.2)  does  not  hold,  and  one  really  does  need 
some  finite  version  of  SELECT.  To  see  that  this  is  possible,  imagine  shrinking  the  ball 
B4r  slightly,  so  that  it  only  has  radius  4r  -  6,  with  0  <  6  <  r.  Now  a  finite  number 
of  preimages  covers  this  ball  B4t _j.  Thus,  whenever  the  actual  position  lies  in  B4r-6 , 
the  probability  that  SELECT  will  guess  the  correct  preimage  is  uniformly  bounded 
away  from  zero.  Furthermore,  one  can  split  the  annulus  B4t  -  B4t_6  into  a  finite 
number  of  regions,  each  of  which  is  preimage  of  the  ball  Z?4r_4  for  some  velocity  and  a 
termination  predicate  that  keeps  "rack  of  time.  Thus  one  can  change  the  problem  into 
a  multi-step  multi-guess  randomization,  and  ensure  that  constraint  (4.2)  is  satisfied. 
See  figure  4.8.  This  approach  applies  more  generally. 


4.3  Summary 

This  chapter  explored  in  the  continuous  domain  the  analogue  to  the  randomized 
strategies  developed  in  chapter  3  for  the  discrete  domain.  The  chapter  first  reviewed 
the  LMT  preimage  methodology  for  planning  guaranteed  strategies  in  the  presence  of 
uncertainty.  This  framework  was  used  as  the  basis  for  defining  randomized  strategies, 
much  as  dynamic  programming  was  used  in  the  discrete  domain.  One  of  the  difficulties 
in  the  continuous  case  is  the  need  for  randomizing  between  a  possibly  infinite  number 
of  decisions.  The  chapter  exhibited  conditions  under  which  infinite  decisions  still 
yield  non-zero  probabilities  of  success.  Further,  it  was  shown  that  in  many  cases 
apparently  infinite  decisions  may  be  reduced  to  a  finite  number  of  choices. 


Chapter  5 

Diffusions  and  Simple  Feedback 
Loops 


In  this  chapter  we  will  explore  the  continuous  version  of  discrete  random  walks.  In 
continuum  spaces  the  r^tural  analogue  to  a  random  walk  is  a  diffusion  process. 

We  saw  in  the  discrete  setting  that  random  walks  on  graphs  constitute  a  simple 
type  of  randomized  strategy,  in  which  the  future  behavior  of  the  system  depends 
probabilistically  only  on  the  current  state  and  not  on  any  past  states.  Simple  feedback 
loops  constitute  an  important  class  of  random  walks,  whenever  the  control  and  sensing 
errors  are  probabilistically  distributed.  Recall  that  in  a  simple  feedback  loop  the 
current  action  to  be  executed  is  determined  solely  as  a  function  of  current  sensory 
information,  without  any  reference  to  previous  sensory  values. 

We  already  caught  a  glimpse  of  the  behavior  of  a  continuous  simple  feedback 
loop  in  the  introductory  example  of  section  2.4.  In  that  example  we  assumed  an 
error  distribution  consisting  of  a  fixed  bias.  More  generally,  we  would  like  to  have 
a  language  for  describing  the  behavior  of  the  strategy  of  that  example  for  various 
distributions.  In  particular,  we  would  like  to  determine  the  convergence  times  of  the 
strategy  for  certain  simple  common  error  distributions,  such  as  unbiased  Gaussians. 
Fast  specialized  strategies  are  known  in  these  cases.  The  randomized  strategy  is 
formulated  to  succeed  independent  of  the  actual  error  distributions,  so  long  as  these 
distributions  satisfy  certain  bounds.  However,  the  speed  of  convergence  of  the  strategy 
depends  on  the  actual  error  distributions.  If  the  speed  of  convergence  is  reasonably 
quick  in  the  simple  settings  then  it  makes  sense  to  employ  the  generally  applicable 
randomized  strategy  rather  than  to  seek  and  employ  a  specialized  fast  strategy  for 
each  possible  instantiation  of  error  distributions. 

In  the  discrete  setting  the  notions  of  progress  measure  and  expected  progress 
provided  a  convenient  tool  for  discussing  the  behavior  and  convergence  times  of 
random  walks.  These  same  notions  carry  over  to  the  continuous  setting.  Indeed 
the  notion  of  an  expected  local  velocity  arises  in  the  very  definition  of  a  diffusion 
process. 

This  chapter  will  briefly  review  some  basic  facts  from  diffusion  theory,  then  turn  to 
examples.  We  will  not  restate  or  reprove  all  the  results  from  discrete  random  walks  in 
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the  continuous  setting.  Instead,  the  main  aim  of  this  chapter  is  to  develop  an  approach 
for  analyzing  simple  feedback  strategies  of  the  type  discussed  in  section  2.4.  These 
strategies  execute  actions  designed  to  make  progress  along  some  progress  measure 
whenever  the  current  sensory  information  permits  such  progress,  and  otherwise  they 
execute  a  randomizing  action.  The  randomizing  actions  ensure  that  the  system  will 
not  become  stuck  hopelessly  in  some  region  from  which  progress  is  impossible.  We 
will  focus  in  particular  on  a  fairly  detailed  analysis  of  the  randomized  strategy  for 
attaining  a  two-dimensional  hole,  as  in  the  example  of  section  2.4. 


5.1  Diffusions 

A  diffusion  process  is  basically  the  continuous  version  of  a  random  walk.  The 
important  quantities  that  govern  the  behavior  of  a  diffusion  process  are  the  local 
drift  and  variance  at  each  point  in  the  state  space.  These  measure  the  expected 
velocity  at  each  point  and  the  variance  in  that  expectation.  Additionally,  diffusion 
processes  satisfy  a  continuity  requirement,  which  ensures  that  nearly  all  sample  paths 
of  the  process  are  continuous.  This  requirement  therefore  excludes  processes  that 
make  random  jumps,  such  as  the  first  randomized  strategy  suggested  for  the  example 
of  section  2.4.  Other  processes  excluded  are  those  in  which  history  plays  a  role  in 
determining  the  future  behavior  of  the  system.  In  other  words,  diffusion  processes 
must  be  Markovian. 

The  following  material  is  a  condensed  version  of  the  discussion  of  diffusion 
processes  found  in  [KT2]  and  [Fellerll]. 

First,  let  us  assume  that  the  state  space  of  our  diffusion  process  is  3?",  and  let  us 
denote  the  process  by  {X(t),<  >  0}.  In  other  words,  X(<)  (E  3?n  is  a  random  variable 
describing  the  state  of  the  system  at  time  t.  The  Markovian  nature  of  the  process 
means  that  there  exists  a  function  Qf.at(x,y),  which  describes  the  probabilistic 
transition  function  of  the  process  over  the  time  interval  [t,t  +  A<].  This  function 
plays  the  role  of  the  probability  matrix  (p,j)  in  the  discrete  case.  In  particular,  if  the 
system  is  in  state  x  at  time  t,  then  the  probability  that  it  will  be  in  a  state  in  the  set 
Y  at  time  t  +  At  is  given  by 


/  Qt,& i(x,y)dy. 

The  continuity  condition  then  takes  the  form, 

lim-TT  /  ,  Qt,a<(x,y)dy  =  0, 

itjo  At  ^|y-x|>« 

for  all  positive  <5. 

In  keeping  with  standard  notation,  we  will  use  the  symbol  aE[  ■  ]”  to  denote  the 
expectation  of  whatever  represents.  The  probability  space  for  computing  this 
expectation  will  generally  be  clear  from  context.  Following  [KT2],  we  will  let  A^X(<) 
be  the  change  in  the  process  over  time  interval  h,  that  is,  A/,X(t)  =  X(f  Ah)-  X(t). 


5.1.  DIFFUSIONS 


223 


The  local  or  infinitesimal  drift  /i(x,  t)  and  variance  a2(x,t)  axe  given  by  the 
formulas:  1 

(5.1)  M(x,0  =  limi£(AAX(f)|X(f)  =  x], 

/iio  ti 

(5.2)  S(x,t )  =  Bmi£[{A»X(i)),!X(i)  =  x]. 

hlQ  ft 

Here  p  is  a  vector  of  dimension  n  that  represents  the  expected  velocity  of  the  process, 
while  a2  and  {AaX(/)}2  are  matrices  of  dimension  n  x  n  that  essentially  measure 
the  autocorrelation  of  the  process.  We  will  also  refer  to  local  drift  as  expected 
infinitesimal  velocity  and  as  expected  velocity ,  indicating  that  this  notion  of  velocity 
is  a  probabilistic  average  over  possible  displacements. 

As  pointed  out  in  [KT2],  under  certain  regularity  conditions,  a  Markov  process  is 
known  to  be  a  diffusion  process  if  the  following  condition  holds  for  some  p  >  2.  The 
limit  is  assumed  to  exist  uniformly  in  x  over  any  compact  subset  of  the  state  space. 

(5-3)  Hmi£[|AfcX(OnX(<)  =  x]  =  0. 

5.1.1  Convergence  to  Diffusions 

For  the  applications  in  which  we  are  interested  the  resulting  strategies  are  not 
diffusion  processes.  This  is  because  sensing  and  action  generally  occur  at  discrete 
time  intervals,  rather  than  continuously,  so  that  the  process  is  not  strictly  speaking 
Markovian  at  each  location  and  instant  of  time.  A  correct  description  of  these 
strategies  would  therefore  model  each  as  a  sequence  of  actions  executed  in  a  continuous 
space  at  discrete,  not  necessarily  regularly  spaced,  time  intervals.  For  each  action 
one  would  define  a  probability  transition  kernel  Q  as  above,  then  chain  several  such 
actions  together  by  convolving  these  kernels  in  the  manner  outlined  by  the  Chapman- 
Kolmogorov  equation.2  However,  this  approach  obscures  some  of  the  basic  issues  that 
are  of  concern  to  us,  namely  whether  the  process  is  making  progress  towards  the  goal, 
and  if  so,  how  fast  it  is  moving.  Fortunately,  many  discrete  time  processes  may  be 
thought  of  as  part  of  a  sequence  of  such  processes  that  converges  to  a  diffusion  process. 
In  such  cases  the  diffusion  process  may  well  approximate  the  discrete-time  process. 
In  these  cases,  an  analysis  of  the  diffusion  process  provides  as  well  an  approximate 
analysis  of  the  discrete-time  process.  We  will  not  worry  about  the  details  of  such 
approximations,  but  simply  point  to  [KT2]  for  a  brief  introduction.  In  this  chapter 
we  will  assume  that  our  discrete  representations  may  be  approximated  by  diffusion 
processes.  The  'easonableness  of  this  assumption  will  become  clear  once  we  exhibit  a 


1  It  is  assumed  that  these  limits  exist. 
5 See  [Fellerlll,  page  322. 
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process  and  note  how  its  dependence  on  small  increments  of  time  h  satisfies  conditions 
(5.1),  (5.2),  and  (5.3). 

For  the  sake  of  example,  we  present  here  the  convergence  of  a  sequence  of  discrete 
random  walks  to  the  most  basic  of  diffusion  processes,  namely  the  Brownian  motion. 
This  example  is  taken  from  [Ross],  page  184. 

We  define  a  random  walk  on  the  the  real  line  §?  with  cycle  time  At  and  step  size 
Ax.  This  means  that  at  each  of  the  discrete  points  in  time  At,2At,...  the  process 
will  move  either  to  the  right  or  to  the  left  by  Ax,  each  possibility  occurring  with 
probability  1/2.  The  process  initially  starts  off  at  the  origin.  Let  X(t)  be  the  random 
variable  denoting  the  position  of  the  process  at  time  t.  Then 


X(t)  =  &x(Xi  +  ~-Xw&t\), 

where  A',  is  determined  by  the  ith  step,  that  is,  A",  is  either  —1  or  +1,  each  with 
probability  1/2.  The  X ,  are  assumed  to  be  independent.  Therefore  £'[A’I]  =  0  and 
E\X2]  =  1  for  each  i.  Thus,  observe  that 


£[*(()]  = 


and 


Var  (X(t))  = 

(5.4) 


Now  suppose  that  one  lets  both  Ax  and  At  go  to  0.  This  cannot  be  done 
arbitrarily,  since  the  variance  (5.4)  should  go  neither  to  zero,  which  would  imply 
a  deterministic  and,  in  this  case,  unmoving  process,  nor  to  infinity,  which  would 
imply  complete  uncertainty.  This  says  that  one  must,  in  the  limit,  take  Ax  =  cy/At, 
for  some  constant  c  >  0.  In  that  case  E[X{t)\  =  0,  while  Var (AT(f))  converges  to  c2t. 

The  resulting  process  is  a  diffusion  process  known  as  Brownian  motion.  Observe 
that  the  central  limit  theorem  implies  that  X(t)  is  normally  distributed,  with  mean 
0  and  variance  c2t. 

A  similar  limiting  procedure  may  be  used  to  obtain  a  Brownian  motion  with  non¬ 
zero  infinitesimal  drift. 


L«/Atj 

Ax  £  E[Xt\ 

«=i 

0, 


e  ((A :(<))’) 

(Ax)2  £  E\X2} 

i=i 
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5.1.2  Expected  Convergence  Times 

In  the  discrete  setting  we  computed  convergence  times  by  setting  up  a  set  of  linear 
equations  that  related  the  expected  convergence  times  at  different  states.  The 
coefficients  in  these  linear  equations  were  determined  by  the  transition  probabilities. 
In  the  continuous  setting  the  analogue  of  a  set  of  linear  equations  is  a  linear  differential 
equation.  For  diffusions,  the  coefficients  of  this  linear  differential  equation  are 
determined  by  the  infinitesimal  parameters.  Solving  the  linear  differential  equation 
with  appropriate  boundary  conditions  yields  the  expected  times  to  reach  some  goal. 
This  material  may  be  found  in  any  standard  text  on  diffusions.  See  for  instance  [KT2] 
or  [DynYush].  We  will  focus  on  time- homogeneous  diffusions.  This  simply  means 
that  the  transition  kernels  Qt,At  are  independent  of  t,  implying  that  the  infinitesimal 
parameters  are  independent  of  t  as  well. 

Given  a  diffusion  in  3?n  with  infinitesimal  parameters  /i(x )  and  <r2(x),  one  can 
define  a  linear  operator  L,  whose  coefficients  are  determined  by  these  parameters. 
Let  us  write  a  point  of  the  state  space  as  x  =  (it, . . . ,  i„).  Correspondingly,  we  have 

H(x)  =  (nu...,nn), 

and 


<r2(x) 


(°\i  (x)  •••  <7i2„(x) 

^nl(x)  ...  <7„n(x) 


Here  <7,2  (x)  is  the  infinitesimal  cross-correlation  of  i,  and  Xj,  determined  by  equation 
(5.2).  The  second-order  linear  operator  L  is  then  given  by 


(5.5) 


L  =  -y(r 
2T7 


d 2  v  d_ 

'J  dxidij  +  dii 


Now  consider  an  open  region  H  in  3?"  with  boundary  dfl.  Subject  to  certain 
regularity  conditions,  the  expected  time  to  exit  the  region  from  a  point  x  €  is 
given  by  the  function  r(x),  where  r  satisfies  the  following  partial  differential  equation 
and  boundary  conditions 


(5.6)  Lt(x)  =  -1, 

(5.7)  with  t(x)  =  0  for  x  G  dfl. 

More  complicated  boundary  conditions  may  apply  for  more  complicated  behaviors. 
For  instance,  the  boundary  may  consist  of  two  parts,  one  of  which  defines  the 
boundary  of  the  goal  dG,  and  the  other  of  which  simply  specifies  the  edge  of  the 
workspace  dW .  This  corresponds  in  the  discrete  case  to  the  random  walk  example  of 
page  124.  There  we  were  interested  in  attaining  the  origin,  while  specifying  reflection 
of  the  random  walk  at  the  endpoint  a.  Similarly,  in  a  continuous  domain,  one  would 
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specify  r(x)  =  U  tor  x  €  d(J,  and  =  0  for  x  €  dW.  Here  n(x)  is  the  outward 
normal  to  the  boundary  dW  at  the  point  x.  Insisting  that  the  normal  derivative 
of  r  be  zero  at  the  boundary  is  the  manner  in  which  one  specifies  that  the  process 
reflects  at  the  boundary.3  In  general,  one  cannot  specify  the  boundary  conditions 
arbitrarily.  For  instance,  certain  points  on  the  boundary  may  not  be  reachable,  given 
the  intensity  of  the  expected  drift  near  these  points. 

Notice  that  for  pure  Brownian  motion  in  3?n,  with  unit  variance,  the  differential 
equation  (5.6)  reduces  to  a  form  of  Poisson’s  equation: 

(5.8)  V2t  =  -2, 
with  appropriate  boundary  conditions. 

5.1.3  Brownian  Motion  on  an  Interval 

Solving  equation  (5.6)  can  be  a  formidable  task.  A  common  approach  is  to  use  the 
method  of  Green’s  functions.  However,  for  some  examples  the  differential  equation 
is  easily  solvable.  One  such  case  is  given  whenever  the  coefficients  in  the  operator  L 
are  constants.  We  will  look  at  this  case  for  a  diffusion  on  a  subset  of  the  real  line, 
namely  the  interval  [0,aj.  This  example  will  also  demonstrate  the  relationship  of  the 
discrete  and  continuous  cases.  Recall  the  discrete  case  was  analyzed  on  page  124. 

With  constant  infinitesimal  parameters,  the  one- dimensional  diffusion  is  simply  a 
Brownian  motion  with  drift.  Let  us  denote  by  a2  the  constant  infinitesimal  variance, 
and  by  p  the  constant  infinitesimal  drift.  Note  that  cr 2  is  non-negative,  but  fi  can 
be  any  real  number.  We  will  assume  that  the  goal  is  given  by  the  origin,  and  that 
reflection  occurs  at  the  point  a.  Thus  equation  (5.6)  becomes: 

(5.9)  1-o2t"{x)  +  fir'(x)  =  -1, 

with  boundary  conditions  r(0)  =  0  and  r'(a)  =  0. 

First,  let  us  deal  with  some  special  cases.  If  a2  —  0,  then  the  process  is 
deterministic.  In  particular,  the  system  moves  along  the  real  line  with  velocity  / 1 .  If 
this  velocity  is  strictly  negative,  then  the  origin  can  be  attained,  otherwise  it  cannot. 
Thus,  whenever  n  <  0,  we  see  that  a  solution  to  (5.9)  is  given  by  r(x)  =  —  x/p.  Notice 
that  in  this  case  the  second  boundary  condition  r'(a)  —  0  cannot  be  satisfied,  which 
is  consistent  with  the  fact  that  (5.9)  is  now  a  first-order  linear  differential  equation. 

With  this  special  case  out  of  the  way,  let  us  assume  that  a2  >  0.  First,  let  us  turn 
to  the  case  of  a  pure  Brownian  motion,  with  no  drift,  that  is,  with  p  =  0.  In  that 
case,  the  solution  to  (5.9)  is  given  by 

r(x)  =  -^x  (2a  -x). 

<7 


3See  [DynYush],  page  149. 
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Notice  the  same  quadratic  character  of  the  solution  that  we  observed  in  the  discrete 
setting. 

Finally,  in  the  case  that  a1  >  0  and  p  ^  0,  we  have  that 

Again  notice  the  strong  similarity  to  the  discrete  case.  In  particular  if  p  is  negative, 
then  t(x)  <  —x/p.  Furthermore,  if  a  is  fairly  large  with  x  •<  a,  then  r(x)  «  —x/p. 
In  other  words,  the  expected  time  to  reach  the  origin  is  essentially  the  distance  to  the 
origin,  divided  by  the  expected  velocity  of  approach.  Thus,  with  drift  in  the  correct 
direction,  the  diffusion  behaves  almost  like  a  deterministic  process. 

We  see  then  that  the  infinitesimal  drift  in  the  continuous  setting  has  strong 
similarities  to  the  expected  velocity  at  a  state  in  the  discrete  setting.  These  similarities 
carry  over  to  the  labelling  of  states  by  one-dimensional  quantities  and  to  the  expected 
infinitesimal  velocity  relative  to  such  labellings.  In  particular,  one  can  transform  the 
state  space  so  that  the  labelling  corresponds  to  the  expected  time  to  attain  some  goal. 
In  general,  given  a  smooth  non-negative  labelling  that  is  zero  at  the  goal,  if  the  local 
drift  relative  to  this  labelling  is  negative  at  every  point  and  uniformly  bounded  away 
from  zero,  then  one  can  obtain  a  simple  upper  bound  for  the  expected  time  to  reach 
the  goal.  We  will  not  formally  develop  these  issues  in  the  continuous  setting,  merely 
take  our  lead  from  the  discrete  results  4 


5.1.4  The  Bessel  Process 

A  very  important  diffusion  process  is  the  Bessel  process.  This  process  is  a  one¬ 
dimensional  diffusion  that  measures  the  distance  from  the  origin  of  a  point  undergoing 
a  pure  Brownian  motion  in  9?71.  Our  interest  in  this  pmccs"  ctf>uu  directly  from  the 
natural  labelling  provided  by  a  distance  measure.  In  particular,  if  one  can  execute  a 
randomized  strategy  that  makes  sufficient  expected  progress  relative  to  a  measure  of 
distance  from  the  goal,  then  one  can  be  assured  of  essentially  linear  convergence  times. 
In  other  words,  the  expected  convergence  times  are  proportional  to  the  distance  from 
the  goal,  or  better.  The  simple  feedback  loop  introduced  in  section  2.4  provides  a 
two-dimensional  instantiation  of  this  problem,  one  which  we  will  examine  extensively 
in  the  rest  of  this  chapter. 

Unfortunately,  pure  random  motions  will  not  make  local  progress  relative  to  the 
distance  measure  in  3?n.  We  saw  this  in  the  discrete  setting,  and  it  appears  again  in  the 

4In  order  to  briefly  indicate  the  relationship  between  the  continuous  and  discrete  settings,  suppose 
that  we  define  the  expected  infinitesimal  velocity  relative  to  some  labelling  £  :  f2  ►-*  9?  of  the  state 
space  as  i/(x)  =  lirrujo  ££[  Ah£(X(t))  |X(t)  =  x).  This  is  the  natural  analogue  of  the  expected 
infinitesimal  velocity  in  3fn.  The  theory  of  semi-groups  tells  us  that  in  fact  i/(x)  =  (L£)(x),  where  L 
is  the  linear  operator  (5.5)  associated  with  the  diffusion.  Thus,  if  we  set  £(x)  =  r(x),  the  expected 
time  to  attain  the  goal,  then  essentially  by  definition  (equation  (5.6))  the  expected  infinitesimal 
velocity  relative  to  the  labelling  must  be  constant,  that  is,  v(x)  =  -1  for  all  x£il.  This  is  precisely 
the  result  that  we  proved  in  the  discrete  case.  See  [Fellerll]  or  [KT2]  for  a  discussion  of  semi-groups. 
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continuous  setting.  The  expected  convergence  times  to  attain  a  ball  about  the  origin  5 
are  on  the  order  of  |xjn,  Thus  one  would  need  to  dilate  the  state  space  polynomially  in 
order  to  see  local  progress.  In  order  to  obtain  convergence  times  that  are  linear  in  |x| 
one  must  hope  that  one’s  sensors  are  good  enough  to  overcome  the  natural  outward 
drift  of  a  randomized  strategy.  Clearly,  this  is  not  always  possible.  For  instance,  in 
the  example  in  section  2.4,  for  certain  approach  directions  the  sensing  bias  forces  the 
system  into  a  region  within  which  sensing  is  useless.  Only  pure  randomizing  motions 
are  possible.  Of  course,  the  strategy  is  guaranteed  eventually  to  attain  the  goal, 
independent  of  the  sensor  distribution.  One  question  is  whether  for  sufficiently  nice 
sensors  the  strategy  converges  quickly.  In  answering  that  question,  the  Bessel  process 
plays  an  integral  role. 

Let  us  define  the  Bessel  process  and  exhibit  its  infinitesimal  parameters.  We  will 
not  derive  these  parameters  nor  prove  that  the  Bessel  process  is  indeed  a  diffusion, 
but  instead  refer  the  interested  reader  to  [KTl]  and  [KT2].  Later,  when  we  examine 
the  two-dimensional  feedback  strategy,  we  will  essentially  derive  the  infinitesimal 
parameters  as  part  of  a  more  complicated  derivation. 

Let  X(f)  =  (Xi(f ),..., .Yn(t))  denote  a  pure  Brownian  motion  in  3in.  Thus  the 
infinitesimal  parameters  of  X(<)  are  given  by 


p(x)  =  0, 
<r2(x)  =  er2!*, 


where  I„  is  the  n  x  n  identity  matrix. 

The  Bessel  process  is  given  by  {F(<),t  >  0),  with 


Y(t)  = 


i=i 


The  infinitesimal  parameters  of  this  process  are 


n  —  1 
~2 

<7y(y)  =  <72- 

In  other  words,  the  infinitesimal  variance  is  the  same  as  for  the  underlying  Brownian 
motion,  but  now  there  is  a  natural  drift  away  from  the  origin  that  is  inversely 
proportional  to  the  distance  from  the  origin. 

In  deriving  these  parameters,  one  approach  is  to  first  determine  the  infinitesimal 
parameters  for  the  process  Z{t)  =  Y{t)2  from  basic  principles,  then  use  the  following 

5This  assumes  that  n  >  2  and  that  the  domain  of  diffusion  fi  is  bounded,  say,  Q  =  Bn ,  the 
unit  ball  in  3f".  For  9?‘  the  expected  time  is  on  the  order  of  |x|2,  while  for  9?2  it  is  on  the  order  of 
|x|2  iog|x|,  as  we  have  already  noted. 


5.1.  Nod- Deterministic  and  Probabilistic  Errors 


229 


fact6  to  obtain  the  infinitesimal  parameters  for  Y(t).  Specifically,  if  Z(t )  is  a  regular7 
one-dimensional  diffusion  on  some  interval  in  3?  with  infinitesimal  parameters  pz(z) 
and  cr|(z),  an^  if  g  :  3?  >— ►  3?  is  a  strictly  monotone  function  with  continuous  second 
derivative,  then  K(<)  =  g(Z(t))  is  a  regular  diffusion  with  infinitesimal  parameters 

(5.10)  nY(v)  =  ^z(z)9"{z)  +Pz(z)g'(z), 

(5.11)  t Ty(y )  =  aziz)  (g'i*))2, 

where  y  =  g(z). 

5.2  Relationship  of  Non-Deterministic  and 
Probabilistic  Errors 

Before  we  are  able  to  analyze  the  two-dimensional  simple  feedback  loop  of  section  2.4 
for  nice  error  distributions,  we  must  settle  on  some  relationship  between  the  model 
of  non-deterministic  error  assumed  by  the  strategy  and  any  probabilistic  distribution 
of  errors.  This  will  not  be  as  straightforward  as  one  might  wish,  and  we  will  have  to 
make  some  arbitrary  choices. 

Recall  that  the  model  of  uncertainty  assumed  by  the  preimage  methodology 
and  by  the  randomizing  example  was  that  of  unknown  but  bounded  uncertainty. 
In  other  words,  actual  values  were  assumed  to  lie  in  some  uncertainty  ball  about 
nominal  values,  but  no  particular  distribution  of  errors  was  assumed.  However,  in 
any  particular  case,  the  error  will  be  distributed  in  some  specific,  although  possibly 
unknown  fashion  about  the  nominal  value.  Consider  the  case  of  a  probabilistic  error 
distribution.  Suppose,  for  instance,  that  the  non-deterministic  model  of  error  is  a 
sensing  error  ball  of  the  form  £?<(x).  This  means  that  whenever  the  actual  position 
is  x,  the  sensor  returns  a  value  x*  within  distance  e  of  x.  Suppose  further  that  the 
actual  error  distribution  is  centered  at  x.  Let  pT  be  the  probability  that  the  observed 
sensor  value  x*  will  lie  further  than  distance  r  from  the  actual  position  x  of  the 
system.  In  symbols,  pr  =  P{jx*  —  x)  >  r}.  We  would  like  to  define  the  radius  e 
of  the  non-determinisitc  sensing  error  ball  in  terms  of  these  probabilities.  If  there  is 
some  r  for  which  pT  =  0,  then  it  makes  sense  to  take  e  to  be  the  infimum  of  all  such 
r.  If  pr  >  0  for  all  r,  then  one  may  wish  to  settle  on  some  threshold  6 ,  and  take  e 
to  be  the  smallest  r  for  which  pr  <  6.  Similarly,  if  x*  is  biased  with  unknown  bias 
whose  magnitude  is  bounded  by  hm**,  then  one  may  first  wish  to  compute  r  as  above 
assuming  no  bias,  then  take  e  to  be  t  —  r  +  bj^.  The  same  approach  applies  to  other 
uncertainties,  such  as  control  unceitainty. 

As  an  example,  consider  a  two-dimensional  sensor  with  a  normal  distribution.  In 
particular,  suppose  the  sensor  has  no  bias,  that  the  variances  along  the  two  axes 


6[KT2],  page  173. 

7This  means  that  every  point  is  reachable  from  every  other  point.  See  [KT2]. 
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are  identical,  and  that  the  measurements  along  the  two  axes  are  uncorrelated.  This 
means  that  if  the  actual  location  is  at  the  origin,  then  the  probability  of  seeing  a 
sensor  value  (ij,  x2)  is  given  by  the  density  function 

1 

p(x!,x2)  =  ^  . 

where  a 2  is  the  variance  of  the  measurement  along  each  dimension.  A  reasonable 
choice  for  the  radius  e  of  the  sensing  uncertainty  ball  might  be  e  =  3cr.  This 
corresponds  to  a  certainty  threshold  of  approximately  98.9%. 

5.2.1  Control  Uncertainty 

While  this  relationship  between  uncertainty  balls  and  probability  distributions  seems 
very  straightforward,  there  are  some  subtleties.  Let  us  focus  first  on  control 
uncertainty,  then  return  to  sensing  uncertainty.  Consider  again  a  two-dimensional 
problem  as  above,  and  suppose  that  whenever  one  commands  a  velocity  v,  the  actual 
velocity  8  is  normally  distributed,  with  the  error  distributions  along  the  two  axes  being 
independent  and  unbiased  and  having  equal  variances.  This  variance  will  generally 
be  a  function  of  the  commanded  velocity,  so  we  will  denote  it  as  ct2(v).  Similarly, 
within  the  unknown  but  bounded  model  of  uncertainty,  the  actual  velocity  is  assumed 
to  lie  within  some  ball  J9e(v).  Here  too  the  radius  c  is  a  function  of  the  commanded 
velocity  v.  A  common  approach  is  to  assume  that  this  radius  is  proportional  to  the 
magnitude  of  the  commanded  velocity.  In  short,  the  unknown  but  bounded  model  of 
uncertainty  would  say  that  the  actual  velocity  v*  satisfies 

(5.12)  |v*  -  vj  <  e„  jv|, 

for  some  constant  >  0.  Rather  than  writing  f?e„|v|(v)  for  the  set  of  v*  satisfying 
constraint  (5.12),  we  will  henceforth  write  velocity  uncertainty  as  BCv{v),  with  the 
understanding  that  e„  refers  to  an  error  radius  that  is  proportional  to  the  magnitude 
of  the  commanded  velocity. 

In  relating  the  error-ball  and  probabilistic  models  as  we  did  above,  one  might 
therefore  take  3er(v)  =  e.  This  says  that  3cr(v)  =  |v|,  and  hence  that  a(v)  = 

M- 

We  should  try  to  interpret  these  formal  manipulations,  and  determine  whether 
they  make  any  physical  sense.  First,  let  us  observe  that  we  have  specified  uncertainty 
in  terms  of  velocity,  but  that  we  are  really  interested  in  position.  After  all,  an 
action  entails  executing  a  velocity  for  some  period  of  time.  Within  the  error-ball 
model  of  uncertainty,  an  action  specifying  nominal  velocity  v  for  time  A<  means  that 
the  change  in  position  is  non-deterministically  given  by  A  tv*,  with  v*  £  BCv(v).9 

8This  is  in  free  space.  In  contact  space,  we  modify  the  velocity  as  determined  by  the  generalized 
damper  equation  (4.1). 

9For  more  general  error  sets,  such  as  non-convex  error  sets,  this  is  not  correct,  but  generalizations 
are  possible. 
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Said  differently,  the  change  is  position  is  given  by  Ax,  with  Ax  distributed  non- 
deterministically  in  the  ball  B&ttv  jv|(At  v).  Suppose  that  we  now  translate  this  non- 
deterministic  representation  into  the  probabilistic  one  for  an  unbiased  Gaussian  error, 
in  the  manner  just  outlined.  In  two  dimensions  this  means  that  the  set  of  position 
changes  is  distributed  normally  about  A  tv  with  standard  deviation  <7  =  ^  |At  v|. 

Looking  at  this  carefully,  we  should  notice  a  peculiarity  in  the  probabilistic  setting. 
In  particular,  if  instead  of  commanding  velocity  v  for  time  At,  one  repeatedly 
commands  velocity  v  for  time  repeating  this  10000  times,  then  one  should 

improve  considerably  the  final  accuracy  of  the  desired  motion.  In  particular,  the 
central  limit  theorem  suggests  that  the  final  position  will  be  distributed  normally 
about  A  tv,  but  now  with  standard  deviation  o/100.  Indeed,  if  one  passes  to  a 
diffusion  process,  that  is,  to  a  process  in  which  the  motion  is  commanded  repeatedly 
for  infinitesimal  amounts  of  time,  the  motion  actually  becomes  deterministic.  In  order 
to  see  this,  consider  the  infinitesimal  drift  and  variance.  The  infinitesimal  drift  is: 

m(x)  =  1.in?T^[AAx(t)|X(0  =  x] 

hlO  h 

=  lim^/jv 

Mo  h 

=  v. 

In  order  to  compute  the  infinitesimal  variance,  notice  tint  we  only  need  to  compute 
the  variance  in  the  xt  direction  (where  x  =  (r1;  x2)).  This  is  because  the  variance 
in  the  12  direction  is  identical,  and  because  the  cross-correlations  are  zero  by 
independence.  Thus,  writing  v  =  (i»i,  v2),  we  have  that 

=  mo*  {5 |v|1  +  } 

=  0. 

In  short,  we  see  that  the  expected  infinitesimal  velocity  is  just  the  commanded 
velocity,  and  that  the  infinitesimal  variance  is  zero.  This  implies  that  the  process 
moves  deterministically  from  x  in  the  direction  v,  which  does  not  agree  with  the  non- 
deterministic  error-ball  representation.  Thus  there  seems  to  be  a  conflict  between  the 
two  representations.  One  view  is  that  tl  e  problem  arises  in  the  non-deterministic 
model  because  we  have  not  modelled  the  velocity  error  radius  as  a  function  of 
time,  but  only  as  a  function  of  the  commanded  velocity.  Another  view  is  that  the 
problem  arises  in  the  probabilistic  model,  at  least  for  Gaussian  errors,  because  the 
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variance  in  the  change  in  position  is  proportional  to  the  square  of  time.  In  order  to 
model  non-vanishing  error,  the  change  in  time  At  should  appear  with  at  most  linear 
order.  A  third  view  is  to  accept  the  apparent  discrepancy,  by  realizing  that  the  non- 
deterministic  model  may  simply  conservatively  overestimate  the  motion  error.  It  may 
indeed  be  the  case  that  the  errors  exist  as  stated  both  in  the  non-deterministic  and 
probabilistic  cases,  but  that  the  non-deterministic  model  simply  does  not  capture  the 
nice  averaging  effect  that  comes  into  play  by  the  central  limit  theorem.  After  all, 
the  non-deterministic  model  represents  a  whole  collection  of  possible  distributions, 
including  those  with  biases.  For  some  of  these  distributions  one  will  see  the  nice 
averaging  effect,  but  not  necessarily  for  all. 

Nonetheless,  this  leaves  us  with  a  choice  as  to  how  we  want  to  represent 
probabilistic  errors  once  we  pass  to  a  diffusion  analysis  of  randomized  strategies. 
One  possibility  that  reconciles  the  first  two  explanations  above,  is  to  model  the 
error  in  velocity  as  white  noise.  This  is  a  standard  approach  taken  in  the  study 
of  optimal  control  (see,  for  example,  [Stengel]).  Instead  of  having  an  error  that  grows 
proportionally  to  the  change  in  time,  one  has  an  error  that  grows  proportionally  to  the 
square-root  of  the  change  in  time.  While  this  implies  less  error  over  long  motions,  it 
captures  the  presence  of  non- zero  error  over  infinitesimal  amounts  of  time,  that  is,  the 
infinitesimal  variance  is  non-zero.  Thus,  in  terms  of  our  previous  representation,  the 
infinitesimal  drift  is  v,  which  is  just  the  commanded  velocity,  while  the  infinitesimal 
variance  is  of  the  form  cr2x  =  a\^  —  a2  >  0.  This  says  that  after  v  has  been  executed 
for  time  At,  the  variance  in  position  at  that  point  is  on  the  order  of  At  a2.  Relating 
this  to  the  non-deterministic  model,  we  see  that  error  balls  in  this  case  must  be 
modelled  as  functions  of  time,  with  the  position  error  ball  at  time  At  having  radius 

€„  )  V  j  y/A t. 

For  simplicity  we  will  stick  to  the  error  model  that  does  not  capture  the  time- 
dependency.  This  implies  that  for  sufficiently  nice  velocity  distributions,  if  the 
commands  are  issued  quickly  enough  the  resulting  effect  will  be  a  deterministic 
motion.  That  may  seem  to  be  a  bit  generous.  However,  in  terms  of  the  diffusion 
analysis  later  in  this  chapter,  the  significant  terms  will  arise  from  the  variance 
associated  with  the  guessing  of  motions  rather  than  from  errors  in  the  commanded 
velocities.  Furthermore,  once  the  analysis  is  complete  it  will  be  easy  to  add  in  extra 
terms  that  capture  a  non-vanishing  infinitesimal  variance. 


5.2.2  Sensing  Uncertainty 

A  similar  problem  exists  in  reconciling  the  different  representations  of  sensor  errors. 
However,  the  problem  manifests  itself  slightly  differently.  In  particular,  a  strategy  that 
employs  an  unknown  but  bounded  representation  of  sensing  uncertainty  may  make 
decisions  that  differ  radically  depending  on  whether  the  sensed  value  lies  within  or 
beyond  some  distance  of  the  goal.  For  instance,  in  the  randomized  strategy  of  section 
2.4,  if  the  sensed  value  lies  outside  of  the  position  sensing  uncertainty  of  the  goal, 
then  the  strategy  will  move  towards  the  goal,  while  otherwise  it  will  execute  a  random 
motion.  One  immediate  problem  is  that  in  the  probabilistic  case  the  sensor  value 
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may  be  distributed  over  an  unbounded  continuum.  However,  physical  devices  usually 
have  a  limited  range,  so  the  approximation  of  using  a  sufficiently  large  multiple  of  the 
standard  deviation  as  the  error  radius  is  reasonable.  A  more  pronounced  problem  is 
given  by  the  time-dependence  of  sensor  errors.  For  instance,  suppose  that  a  sensor 
reading  is  polluted  with  white  noise  (or  some  physically  realizable  approximation).  In 
a  very  informal  sense,  white  noise  is  the  time  derivative  of  a  Brownian  motion.  The 
result  of  a  sensor  reading  is  determined  essentially  by  a  random  walk  in  the  space 
of  sensor  values,  but  normalized  by  the  time  required  to  obtain  the  sensor  reading. 
If  we  imagine  that  the  sensor  returns  a  reading  instantly,  then  the  variance  in  that 
reading  will  be  infinite.  It  is  only  by  averaging  over  a  finite  extent  of  time  that  the 
sensor  value  assumes  any  meaning.  However,  the  variance  of  the  error  in  this  reading 
is  time  dependent,  and  thus  so  is  the  radius  of  an  error  ball  in  the  non-deterministic 
representation. 

Let  us  make  all  of  this  a  little  more  precise.  Let  us  suppose  that  a  sensor  value  s 
is  computed  by  averaging  a  white  noise  process  {w (t),t  >  0}  over  some  small  time 
interval.  This  means  that  10 


s  =  — -  /  w (t)dt. 

At  JAt  V  ' 

Taking  expectations,  we  see  that  £[s)  =  0,  while  11 

E [ssT]  =  T77  f  [  £[w(0wT(r)]rffdr. 

At  JAt  J At 

For  a  Gaussian  white  noise  process,  the  covariance  function  is  a  delta-function, 
since  by  definition  white  noise  is  completely  uncorrelated  over  time.  Thus 

£[w(f)w7(r)]  —  A6(t  -  r), 

for  some  constant  covariance  matrix  A.  If  the  noise  is  symmetric  and  uncorrelated 
across  different  dimensions,  then  A  is  of  the  form  A  =  A  In,  for  some  positive  constant 
A.  In  any  event,  we  therefore  see  that 

£|ssT|  =  LLAS(t-T)dtdT 

_A 

At' 

In  short,  if  the  sensing  error  arises  from  a  white  noise  process,  then  the  variance 
in  the  error  depends  very  much  on  the  timing  constants  of  the  sensor.  In  particular, 
the  longer  the  averaging  process,  the  better  the  sensing  results.  This  implies  that  a 
strategy  that  assumes  a  bounded  error  ball  in  making  sensor-based  decisions  must  fix 

10See  [Brown],  page  254. 

nIf  v  is  a  column  vector  of  dimension  n,  then  vT  denotes  its  transpose,  and  wT  is  a  matrix  of 
dimension  n  x  n.  Thus  £[vvT]  measures  all  the  covariance  terms. 
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a  particular  minimum  sensing  time,  in  order  to  be  sure  that  the  results  fall  within 
that  error  ball  with  high  enough  probability.  Said  differently,  for  every  sensing  error 
ball,  the  system  tacitly  is  assuming  a  certain  timing  characteristic  that  makes  the 
error  ball  valid.  One  must  therefore  be  careful  when  making  statements  that  involve 
small  changes  in  time.  These  simply  are  not  modelled  in  the  preimage  methodology 
as  set  forth  in  chapter  4.  Formally  one  could  include  time-dependent  sensors  fairly 
easily,  by  modelling  both  actions  and  sensory  operations  as  functions  of  time.  This 
does  however  complicate  the  description  of  preimages  since  now  the  response  time  of 
the  termination  predicate  plays  a  role.  This  path  leads  into  the  domain  of  control 
theory.  We  will  not  follow  this  path,  but  simply  assume  that  sensors  return  values 
instantaneously. 

The  previous  discussion  generalizes  to  the  case  of  a  biased  sensor.  In  this  case 
the  maximum  magnitude  of  the  unknown  bias  in  the  probabilistic  model  is  added 
to  the  radius  of  the  bounding  error  ball  of  the  non-deterministic  model.  The  timing 
characteristics  are  not  affected  by  the  presence  of  a  bias. 

Let  us  say  that  the  assumption  of  an  instantaneous  sensor  is  reasonable  whenever 
the  time  interval  At  used  to  determine  the  error  radius  of  a  sensing  uncertainty 
ball  is  significantly  smaller  than  any  other  time  interval  used  in  executing  a  sensor- 
based  strategy.  In  other  words,  if  all  motions  are  executed  for  some  time  interval  of 
considerably  greater  order  than  At ,  then  one  may  regard  the  sensor  as  instantaneous, 
ignoring  the  dependency  on  At.  In  the  upcoming  diffusion  analysis  of  a  simple 
feedback  loop,  his  condition  is  not  satisfied,  since  computing  instantaneous  expected 
velocities  and  variances  involves  shrinking  all  time  intervals  to  zero  (see  equations  (5.1) 
and  (5.2)).  However,  if  we  take  the  view  that  the  diffusion  analysis  approximates  a 
discrete-time  process  in  which  the  timing  constants  of  the  sensors  are  considerably 
shorter  than  all  other  timing  constants,  then  we  may  continue  to  regard  the 
assumption  of  an  instantaneous  sensor  as  reasonable.  We  will  make  this  assumption, 
bearing  in  mind  that  a  more  complete  analysis  should  consider  a  framework  in  which 
error  balls  are  time-dependent. 


5.3  A  Two-Dimensional  Simple  Feedback  Strategy 

The  tools  are  now  in  place  for  analyzing  the  strategy  outlined  in  section  2.4  in 
the  special  case  that  the  sensing  and  command  errors  have  unbiased  Gaussian 
distributions.  The  reason  that  we  would  like  to  analyze  the  strategy  for  this  special  set 
of  sensing  errors  is  to  determine  how  well  the  strategy  behaves  when  the  uncertainty  is 
fairly  nicely  behaved  itself.  We  know  that  the  strategy  will  always  succeed  eventually, 
independent  of  the  error  distributions,  so  long  as  these  distributions  yield  the  error 
balls  assumed  by  the  strategy.  However,  one  would  like  the  strategy  to  converge 
reasonably  quickly  when  the  error  distributions  are  nicely  behaved.  This  is  because 
there  are  well-known  optimal  control  strategies  in  such  cases  (see,  .^r  instance, 
[Stengel)).  While  the  randomized  strategy  suggested  in  this  thesis  clearly  cannot  be 
optimal,  it  will  nonetheless  converge  reasonably  quickly  for  a  wide  range  of  starting 
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positions.  Thus  one  can  be  assured  that  if  the  sensors  happen  to  be  fairly  well 
behaved,  then  the  strategy  will  converge  quickly,  and  otherwise  it  will  converge.  Thus 
one  does  not  need  to  know  precisely  how  the  sensors  are  behaved,  but  can  rely  on  a 
general  strategy. 

The  Task 

Let  us  begin  by  restating  the  task  and  the  strategy.  The  task  is  to  attain  a  disk  of 
radius  r  >  0  centered  at  the  origin  of  the  two-dimensional  plane.  It  is  assumed  that 
the  goal  is  recognizable,  that  is,  there  is  a  one-bit  sensor  that  signals  goal  attainment. 
Additionally  there  is  a  position  sensor,  which  has  an  error  ball  with  radius  e„.  Shortly 
we  will  assume  that  the  error  distribution  is  Gaussian,  but  the  statement  of  the 
strategy  does  not  assume  any  particular  distribution.  The  system  is  assumed  to  be 
a  first  order  system,  with  velocities  as  commands.  The  error  in  the  actual  velocity 
executed  is  likewise  assumed  to  be  represented  by  an  error  ball  of  radius  e  =  e„  |v|, 
where  v  is  the  commanded  velocity. 

The  Strategy 

The  strategy  operates  as  follows.  The  basic  idea  is  to  move  towards  the  origin  when 
doing  so  will  decrease  the  distance  for  all  possible  interpretations  of  the  current  sensed 
position,  and  otherwise  to  execute  a  random  motion.  We  will  model  the  random 
motion  as  a  Brownian  motion,  and  analyze  the  whole  process  as  a  diffusion.  However 
it  should  be  understood  that  this  is  just  an  approximation  to  the  actual  discrete-time 
process,  since  the  strategy  in  general  will  include  a  delay  due  both  to  sensing  and 
motion  execution. 

It  is  possible  to  improve  this  strategy  by  taking  account  of  the  goal,  and  of 
preimages  of  the  goal.  In  particular,  rather  than  trying  to  decrease  the  distance 
to  the  origin,  a  strategy  could  try  to  decrease  the  distance  to  the  goad.  Additionally, 
rather  than  choosing  a  completely  random  motion  when  it  is  impossible  to  decrease 
the  distance  to  the  goal,  the  strategy  could  guess  between  covering  backprojections  of 
the  goal.  We  have  implemented  various  simulations  of  these  more  knowledgeable 
strategies,  but  for  our  purposes  here  we  will  focus  on  the  simple  form  of  the 
sensing- guessing  strategy.  [The  term  “sensing-guessing”  derives  from  the  strategy’s 
use  of  both  sensor-based  motions  and  random  motions,  coupled  with  the  view  of 
randomization  as  a  means  of  guessing  the  direction  to  the  goal.] 

Reducing  Distance  to  the  Origin 

First,  let  us  determine  the  conditions  under  which  it  is  possible  to  reduce  the  distance 
to  the  goal.  Consider  figure  5.1.  Instead  of  writing  points  as  x  =  (z1,i2)  we  will  now 
write  them  as  p  =  (x,y).  The  sensor  value  is  at  the  point  ( Jc,0 ),  with  k  >  0.  Since 
only  the  distance  from  the  origin  is  of  importance,  we  can  assume  that  the  sensor  value 
lies  on  the  i-axis,  as  in  the  figure.  In  this  figure  it  is  possible  to  reduce  the  distance 
to  the  origin  for  all  possible  interpretations  of  the  sensor  value.  This  is  because  the 
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Figure  5.1:  For  all  interpretations  of  the  indicated  sensed  position,  the  distance  to 
the  goal  may  be  decreased. 
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Figure  5.2:  This  figure  shows  the  maximum  time  <(p)  that  the  velocity  v  =  (—1,0) 
may  be  executed  before  the  system  might  move  further  away  from  the  origin  than  it 
is  at  the  start  of  the  motion.  The  system’s  start  position  is  p.  The  disk  bounded  by 
the  dashed  circle  represents  the  possible  locations  at  time  £(p). 


ball  of  interpretations  lies  to  the  right  of  the  two  lines  passing  through  the  origin  with 
slopes  i-Jl  —  t-H £»•  [These  slopes  are  determined  by  the  lines  bounding  the  velocity 
error  cone.  In  particular,  sin~l(fw)  is  just  the  half-angle  of  the  velocity  error  cone.] 
If  the  sensed  position  were  close  enough  so  that  the  error  ball  overlapped  the  region 
to  the  left  of  these  lines,  then  it  would  not  be  possible  to  reduce  the  distance  to  the 
origin  for  all  possible  interpretations  of  the  sensor.  In  order  to  see  that  these  lines 
correctly  characterize  the  condition  under  which  the  distance  to  the  origin  may  be 
reduced,  imagine  that  the  state  of  the  system  lies  on  one  of  these  lines.  By  symmetry, 
the  commanded  velocity  will  be  chosen  to  be  of  the  form  v  =  (— u,0),  with  v  >  0.  If 
the  velocity  uncertainty  is  given  by  ev,  then  it  is  possible  for  the  system  to  move  in  a 
direction  that  is  perpendicular  to  the  relevant  line.  Instantaneously  this  motion  does 
not  change  the  system’s  distance  from  the  origin,  and  thus  represents  the  boundary 
condition  between  guaranteed  approach  towards  the  origin  and  possible  motion  away 
from  the  origin. 

Maximum  Approach  Time 

Given  that  a  sensed  value  lies  far  enough  away  from  the  origin  that  it  is  possible  to 
reduce  the  distance  to  the  origin  for  all  possible  interpretations,  the  question  arises 
as  to  what  the  commanded  approach  velocity  should  be  and  how  long  it  should  be 
executed.  Let  us  just  assume  that  the  commanded  velocity  has  unit  magnitude,  so 
that  we  can  focus  on  the  maximum  amount  of  time  that  the  system  may  execute 
that  velocity  without  moving  further  away  from  the  origin.  Equivalently,  if  one  fixes 
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the  duration  of  a  motion,  then  one  can  adjust  the  maximum  velocity  accordingly 
Consider  now  figure  5.2.  The  figure  indicates  the  effect  of  an  uncertain  motion  on 
a  particular  starting  position  p.  The  commanded  velocity  is  v  =  (  —  1,0).  At  each 
instant  in  time  <  >  0,  the  set  of  possible  positions  is  given  by  the  time-indexed  forward 
projection  iv,t({p]))  which  is  an  open  ball  of  radius  t  ev,  centered  at  p  +  tv.  So  long 
as  this  ball  lies  within  the  system’s  starting  distance  |p|  of  the  origin,  then  the  motion 
has  reduced  the  system’s  distance  from  the  origin.  For  the  sensor  interpretation  p, 
the  maximum  time  <(p)  that  the  system  may  execute  the  motion  is  thus  given  by 
the  condition  that  the  forward  projection  at  time  f( p)  just  be  tangent  to  the  circle  of 
radius  |p|  centered  at  the  origin.  Minimizing  over  all  possible  sensor  interpretations 
of  the  sensed  value  p’  =  (k,  0),  the  maximum  time  that  the  system  may  execute  the 
motion  v  =  (  —  1,0)  is  thus  given  by 

=  min  f(p). 

peB«,(P*) 

Now  let  us  determine  the  maximum  time  f(p)  for  a  given  point  p.  This  time 
satisfies  the  equation 

(5.13)  |p  +  t  v|  +  tev  -  |p|. 

Clearly  t  =  0  is  a  solution  to  this  equation,  corresponding  to  the  initial  degenerate 
tangencv  of  the  forward  projection  with  the  circle  of  radius  |p|.  The  other  solution 
in  t  of  this  equation  will  correspond  to  the  maximum  allowable  time  that  the  velocity 
v  may  be  commanded,  assuming  that  in  the  interval  in  between  these  two  times  the 
inequality  |p  +  t  v|  +  t  e„  <  jp j  holds.  Solving  for  t  by  twice  squaring  equation  (5.13), 
we  arrive  at  four  possible  solutions.  Two  of  these  are  zero.  The  remaining  two  are 
given  by 


t  = 


v  •  v  -  e*' 


±e»( P-P)1/2  — P-v  . 


If  v  =  (  —  1,0),  as  we  have  been  assuming,  and  if  p  =  (x,y),  then  this  becomes 


t  = 


1  -  d 


x2  +  y2  +  x 


Observe  that  this  really  only  makes  sense  if  ev  <  1.  It  is  reasonable  to  thus  restrict 
e„,  since  otherwise  commanding  a  velocity  v  could  in  principle  cause  a  motion  in  any 
arbitrary  direction.  Denote  the  solution  corresponding  to  y/x2  +  y2  +  x  by  t  +  ,  and 
the  solution  corresponding  to  —  \Jx2  +  y2  +  x  by  t~.  Of  these  two  solutions,  one  is 
the  solution  we  are  seeking,  while  the  other  was  merely  introduced  by  our  squaring 
operation.  Clearly  we  want  a  solution  for  which  t  >  0,  so  if  we  can  show  that  t~  >  0, 
then  it  is  the  desired  solution  since  t~  <  t+.  In  order  to  see  that  t~  >  0,  define  the 
function 


/(*)  =  IPi  -  IP  +  <v|  -ft. 
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=  yJx^A  y 2  -  yj(x  -  t )2  +  y2  -  tev. 

Now,  /( 0)  =  0,  and  f(t~)  =  0  by  definition  of  t~.  Given  that  the  start  position  p  lies 
to  the  right  of  the  lines  of  slope  ±y/l  —  tl/ev,  the  possible  locations  of  the  system  at 
time  t  must  lie  wholly  inside  the  circle  of  radius  |p|,  at  least  for  small  values  of  time 
t.  Thus  the  inequality  /(f)  >0  holds  for  small  values  of  t,  as  desired.  This  implies 
that  /'( 0)  >  0.  Computing  the  derivative  of  /,  we  see  that 


m 


X  —  t 

yj(x-t)2  +  y* 

x  —  t  —  ev  yj(x  -  f)2  +  y 2 
yj(x  -  t)2  +  y2 


In  particular  sign(/'(U))  =  sign(r  —  \Jx 5  +  y2).  So.  we  see  that  x  —  e„  y/x^^j-  y 2  >  0, 
which  says  that  t~  >  0,  as  we  wished  to  show.  [We  could  also  have  argued  directly 
that,  x  —  \/x 2  +  y2  >  0  since  p  lies  to  the  right  of  the  lines  of  slope  ±^1  —  t2Uv.} 

Finally,  consider  f'(t~).  Since  f(t~)  =  0,  we  have  that 

0  <  ^{x  -  f-)2  +  y 2  =  \Jx2 "+  y2  -  r  e*. 

This  says  that 


sign  (/'(r)) 


In  other  words,  f'(t~)  <  0.  This  says  that  the  solution  t~  does  indeed  describe  the 
maximum  duration  of  the  motion. 

In  short,  given  that  the  starting  position  is  p,  the  nominal  velocity  v  =  (-1,0) 
may  be  commanded  throughout  the  time  interval  [0,f-].  During  that  time  interval, 
the  system’s  distance  from  the  origin  is  guaranteed  to  be  no  greater  than  its  starting 
distance  |p|.  Furthermore,  for  any  shorter  duration  than  t~,  the  system  is  guaranteed 
to  approach  closer  to  the  origin,  independent  of  the  actual  err -r  distribution  of 
velocities  within  the  error  ball  about  the  nominal  commanded  velocity. 

We  have  computed  the  maximum  time  that  a  motion  may  be  executed  for  a 
particular  interpretation  of  the  sensor  position.  Using  this  we  can  find  the  maximum 
time  that  is  safe  for  all  possible  interpretations.  Given  a  sensed  position  p'  =  ( A:, 0), 
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with  k  positive  and  far  enough  from  the  origin,  the  maximum  amount  of  time  that 
the  velocity  v  =  (—1,0)  may  be  commanded  is  given  by 


*A(P) 

min  - — ~  fey  Jx 2  +  y2  -  x) 

peB«,(P‘)  1  -  e2v  V  v  J 

- — max  (ev  Jx1  +  y2  -  x)  . 

1  -  c2  pes,  (p*)  V  v  / 

Here  we  are  writing  p  as  p  =  (x,y).  _ 

Let  us  therefore  focus  on  maximizing  the  function  q(x,y )  =  cv  \/x2  +  y2  —  x, 
subject  to  the  constraint  that  (x.y)  £  5{j(p’).  A  more  sophisticated  strategy  would 
only  consider  those  sensory  interpretations  that  lie  outside  of  the  goal.  This  would 
amount  to  maximizing  q{x,y)  subject  to  the  constraint  that  (x,y)  £  BCl( p*)  —  G, 
where  G  —  Br( 0)  is  the  goal  disk.  It  is  a  straightforward  matter  to  modify  the 
strategy  accordingly,  but  we  will  not  do  so  here. 

Tf  =  0,  that  is,  if  there  is  no  command  error,  then  y(x,y)  =  — x.  Thus  q{x.y)  is 
maximized  when  x  is  as  close  to  the  origin  as  possible.  If  the  strategy  does  not  take 
the  goal  into  account  in  deciding  which  points  need  to  be  moved  closer  to  the  origin, 
but  considers  the  full  sensing  error  ball,  then  q{x,y)  is  maximized  at  x  =  k  —  ea.  This 
is  the  smallest  x-coordinate  of  a  point  in  the  sensing  error  ball  Be<(p’) 

Now  consider  the  case  0  <  £„  <  1.  Let  us  construct  the  level  curves  in  the  plane, 
given  by  q(x,y)  =  c,  with  c  some  constant.  Since  k  >  0  and  far  enough  from  the 
origin,  we  can  assume  without  loss  of  generality  that  x  >  0.  Furthermore,  by  the 
same  argument  that  showed  that  t~  >  0  above,  we  can  assume  that  c  <  0.  Thus  we 
have  that 


(5.14) 


x  +  c  =  evi/x2  +  y2, 
x2  +  2xc  +  c2  =  e2  (x2  +  y2). 


So 


( 1  -  e2)  x2  +  2xc  +  c2  -  e„2  y2  =  0, 
from  which  we  see  that  the  level  curves  are  hyperbolas,  given  by 


y 


2 


=  1, 


2 
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with  c  <  0.  See  figure  5.3.  We  are  interested  in  the  right-hand  branch.  In  particular, 
we  are  interested  in  finding  the  hyperbola  with  the  maximum  c  value  that  touches  a 
point  in  the  sensing  error  ball  about  the  sensed  value  p*.  It  is  clear  that  the  maximum 
c  value  is  achieved  at.  the  boundary  of  the  sensing  error  ball.  Thus  we  are  looking  for 
a  hyperbola  that  is  tangent  to  the  circle  of  radius  £»  that  is  centered  at  p*.  There  are 
two  possibilities. 

First,  it  is  possible  that  the  curvature  of  a  hyperbola  at  its  vertex  exceeds  the 
curvature  of  the  circle  that  bounds  the  sensing  error  ball.  In  that  case  there  are  two 
potential  tangency  points  in  the  first  quadrant,  that  is,  there  are  two  locations  along 
the  upper  right  branch  of  the  hyperbola  at  which  a  horizontal  translation  would  bring 
the  hyperbola  into  tangential  contact  with  the  circle.  One  of  the  potential  tangencies 
occurs  at  the  vertex  of  the  hyperbola  on  the  x-axis.  The  other  tangency  occurs 
somewhere  further  along  the  hyperbola.  Our  aim  is  to  find  one  such  hyperbola  that 
is  actually  tangent  to  the  circle,  and  whose  associated  c  value  is  a  maximum.  Second, 
it  is  possible  that  the  curvature  of  the  hyperbolas  is  less  than  that  of  the  sensing  error 
circle.  In  that  case,  the  only  point  of  potential  tangency  occurs  on  the  x-axis,  and 
thus  the  maximizing  hyperbola  is  given  by  that  hyperbola  which  passes  through  the 
point  ( k  —  e,,  0). 

Let  us  first  solve  for  the  tangency  condition,  then  worry  about  the  curvature 
issue  later.  Let  us  assume  that  the  sensing  error  ball  lies  strictly  inside  the  wedge 
determined  by  the  two  rays  emanating  from  the  origin  into  the  right-half  plane  with 
slopes  ±yjl  -  tl/e v.  This  condition  is  given  by  k  >  t,/yj  1  —  £2.  If  this  condition  is 
not  satisfied,  then  commanding  velocity  v  =  (  —  1,0)  for  a  non-zero  duration  of  time 
could  potentially  increase  the  distance  from  the  origin  for  some  point  in  the  sensing 
error  ball. 

We  can  write  the  equations  for  -he  circle  and  the  hyperbola  as: 


(x  —  k )2  +  y2  =  e]  and 


(*  -M2  _  £  , 

a2  b 2 


with  k  >  0  as  above,  and  h  — 
y  from  these  equations  we  get 


-C  -Ctv 


and  6  =  — - 


If  we  eliminate 


In  other  words, 


e2  -  (x  -  k)2  =  (x  -  h)2  -  b2. 


(a2  +  b 2)  x2  —  2  ( a2k  +  b2h)x  +  [a2fc2  4-  b2h2  —  a2e2  —  a2b2]  =  0. 


So 


a2k  +  b2h  ±  ^/(a2fc  +  b2h)2  -  (a2  +  62)(a2/fc2  +  62A2  -  a2e2  -  a262) 

a2  -I-  b2 


x  = 
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Figure  5.3:  The  right  branch  of  a  hyperbola.  The  hyperbola  is  parameterized  by  c, 
and  represents  an  iso- value  line  of  the  function  q{x,y)  —  tv  \J x2  -\-  y2  —  x  described 
by  equation  (5.14). 
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Since  we  are  looking  for  a  tangency  point,  the  discriminant  in  this  solution  must 
actually  be  zero.  This  additional  constraint  allows  us  to  solve  for  c  and  thus  for  the 
appropriate  hyperbola  that  is  tangent  to  the  circle.  Thu? 


0  =  a4k 2  -f  b4h2  +  2 a2b2hk  —  a4k2  —  a2b2h2  +  a4e2  +  a4b 2 
-  a2b2k 2  -  b4h 2  +  a2b2t\  +  a2b4 
=  2 a2b2hk  -  a2b2h2  +  a4e]  +  a4b2  -  a2b2k2  +  a2b2e]  +  a2b4 

(5.15)  =  —a2b2(h  —  k)2  +  a2(a2  +  b2)(e2  +  62). 

Since  a  >  0,  one  can  divide  equation  (5.15)  by  a 2  to  obtain 

(5.16)  0  =  -b2(h  -  k)2  +  (a2  +  b2){t]  +  b2). 

If  we  instantiate  the  values  of  a,  6,  and  h,  we  see  that 


a2  +  62  = 


c 


(i  -  «5) 


2\2’ 


and  thus  equation  (5.16)  becomes 

^2 


0  =  - 


1  -c,2 


k  + 


i  - 1: 


(i-f?)2 


e2  + 


1  -£.2 


Since  c  0  and  0  <  ct.  <  1,  the  last  constraint  may  be  rewritten  as 


0  =  -  k  + 


1  -  c2 


n  —  t2)2  +  -  f2)  +  c2> 


and  thus 


(5.17) 


c  — 


e2  -  k2(\  -  e2) 
2k 


This  value  of  c  determines  the  correct  hyperbola  that  is  tangent  to  the  boundary 
of  the  sensing  error  ball,  assuming  that  there  is  a  non-trivial  tangency.  The  existence 
of  a  non-trivial  tangency  is  determined  by  the  curvature  of  the  hyperbola  and  the 
circle.  Trivial  tangency  means  that  the  hyperbola  is  tangent  to  the  circle  at  the  point 
(k  —  «,,0).  This  implies  that 


(5.18)  c=  -(1  -ev)(k-es). 

Although  we  will  not  require  it  in  the  sequel,  let  us  determine  the  condition  under 
which  only  trivial  tangency  is  possible.  See  figure  5.4.  The  circle  has  curvature  1  /t,. 
Let  us  compute  the  curvature  of  the  hyperbola 


(5.19) 
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Figure  5.4:  The  first  hyperbola  has  smaller  curvature  than  the  circle,  and  thus  is 
tar  gent  only  in  a  trivial  sense.  The  second  hyperbola  has  greater  curvature,  and  thus 
is  tangent  to  the  circle  in  a  non-trivial  sense. 


In  general  for  a  curve  y  =  y(x)  the  curvature  k  a  a  point  (x,y)  is  given  by 


\y"(*)\ 

(1  +  y'(x)2)3/2' 

For  the  simple  hyperbola  (5.19)  this  expression  becomes 


ba 4 


[(a2  +  b2)x2  —  a4]3^2 ' 

In  particular  at  the  vertex  (a,0),  the  curvature  is  given  by 


x  >  a. 


«(o,0)  - 


which  becomes 


k  = 


if  we  instantiate  the  values  of  a  and  6  for  the  class  of  hyperbolas  that  we  have  been 
considering.  In  order  for  a  non-trivial  tangency  to  exist  the  curvature  of  the  touching 
hyperbola  must  exceed  the  curvature  of  the  circle,  that  is,  k  >  1/e,.  If  we  substitute 
the  maximizing  value  of  c  for  the  touching  hyperbola,  as  given  by  equation  (5.17),  we 
see  that  this  constraint  becomes: 
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c 

> - ’ 

kX I  -  el)  -  e2 

2  tvk 

>  k(l  -  e„). 

For  the  sake  of  our  analysis  of  the  sensing-guessing  strategy,  let  us  not  worry 
about  whether  the  maximizing  c  is  determined  by  equation  (5.17)  or  by  equation 
(5.18).  Instead,  we  will  conservatively  pick  the  larger  of  these.  As  it  turns  out, 
this  is  always  given  by  (5.17),  even  when  (5.17)  does  not  physically  correspond  to  a 
hyperbola  that  has  a  non-trivial  tangency  with  the  sensing  circle.  In  order  to  see  this, 
consider  the  inequality  that  we  would  like  to  prove: 

(5.20)  - - - d  >  -(1  -€v)(k-e,). 

That  is: 

(.]  -  k2  +  k2el  >  - 2k 2  +  2 kes  +  2 k2ev  -  2 keves, 
k2(l  -  t„)2  -  2c,(l  -  ^k  +  e2  >  0. 

Now  consider  the  function  g[i)  =  x2{\  —  iv)2  -  2c,(l  -  ev)x  +  ej.  Observe  that 


g"{x)  >  0. 


Thus  we  see  that  g  is  a  non-negative  function,  which  establishes  the  inequality 

(5.20).  We  see  then  that  q(x,y)  =  c  is  maximized  for  some  c  that  is  bounded  from 
above  by  the  value  of  c  given  by  (5.17).  Thus,  in  deciding  on  the  maximum  amount 
of  time  that  the  velocity  v  =  (  —  1,0)  may  be  executed,  it  is  safe  to  take  c  to  be  given 
by  (5.17).  This  follows  from  the  definition  (5.14).  We  thus  have: 


tm*x{k) 


2  c]-k2(l-tl) 

1  -el  2k 


(5.21) 
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5.4  Analysis  of  the  Sensing- Guessing  Strategy  in 
a  Simple  Case 

We  are  now  in  a  position  to  analyze  the  sensing-guessing  strategy  outlined  earlier  (see 
page  235).  We  will  focus  on  a  particularly  nice  version  of  the  strategy,  in  which  the 
sensing  and  command  errors  are  unbiased  and  normally  distributed.  Despite  such 
nice  distributions  the  analysis  will  quickly  become  complicated.  For  this  reason  most 
of  the  results  in  this  section  are  numerical. 

Let  us  assume  that  the  strategy  executes  a  simple  feedback  loop  that  repeatedly 
senses  the  current  position,  then,  depending  on  the  distance  of  the  sensed  value 
from  the  origin,  either  executes  a  Brownian  motion  for  a  short  period  of  time  or 
reduces  the  distance  to  the  goal  for  all  possible  sensor  interpretations.  Let  us  fix  the 
maximum  possible  time  interval  between  sensing  operations  as  dt.  This  time  interval 
is  used  to  compute  a  maximum  commanded  velocity  of  approach,  analogous  to  the 
maximum  time  computation  (5.21).  Although  the  strategy  assumes  a  maximum 
duration  between  sensing  operations  of  time  dt,  we  will  permit  the  actual  duration 
to  be  A t,  with  At  <  dt.  In  a  sense,  the  quantity  1  /dt  serves  as  a  cap  on  the 
maximum  velocity  magnitude  that  may  be  executed.  This  prevents  the  strategy  from 
becoming  a  jump  p  cess  as  the  time  interval  At  shrinks  to  zero.  Instead,  the  process 
becomes  a  diffusion  process,  and  we  can  use  the  analysis  of  this  diffusion  process  to 
approximately  characterize  the  behavior  of  the  sensing-guessing  strategy.  12 

Throughout  we  assume  that  sensing  is  instantaneous,  by  which  we  mean  that  the 
time  constants  associated  with  sensing  are  much  smaller  than  those  of  the  rest  of 
the  system  (see  section  5.2).  While  instantaneous  sensing  could  be  used  to  achieve 
perfect  information  for  the  unbiased  sensor  distribution  that  we  are  assuming,  the 
strategy  is  not  actually  aware  of  this  distribution.  Recall  that  the  strategy  should 
succeed  independent  of  the  actual  distribution. 

The  sensing  and  command  errors  are  assumed  to  be  two-dimensional  normal 
variates  with  zero  bias.  We  will  use  a  certainty  threshold  of  98.9%  in  approximating 
these  errors  by  uncertainty  balls.  See  again  section  5.2.  Thus  the  standard  deviation 
of  the  sensing  error  is  given  by  a,  =  ^  t,.  Similarly,  the  standard  deviation  of  the 
velocity  error  is  given  by  av  =  ^  tv  |v|,  where  v  is  the  commanded  velocity. 

Consider  now  a  sensed  value  p*  at  a  distance  k  >  0  from  the  origin.  If 
k  <  t,/ \J\  —  e2,  then  it  is  not  possible  to  move  all  interpretations  of  the  sensed 
value  closer  to  the  origin.  In  this  case,  the  system  executes  a  Brownian  motion  for 
time  Af  <  dt,  then  takes  a  new  sensor  reading.  Let  us  assume  that  the  infinitesimal 
variance  of  the  Brownian  motion  is  given  coordinate- wise  by  a\. 

If  k  >  (.,! \j\  -  e2,  then  the  system  executes  a  motion  directed  towards  the  origin 
for  time  Af  <  dt,  followed  by  a  new  sensor  reading.  The  commanded  velocity  v  is 
parallel  to  the  vector  — p*.  We  can  determine  the  maximum  allowable  magnitude  of 

12One  might  very  well  be  interested  in  a  jump  process.  Indeed,  one  of  the  random  strategies 
suggested  for  the  example  of  section  2.4  was  a  jump  process.  However,  we  will  not  consider  these 
here. 
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Figure  5.5:  The  system  is  at  location  'a,0).  The  possible  sensor  values  form  a  disk 
of  radius  e,  about  this  point.  If  a  sens  r  value  is  at  least  distance  d  away  from  the 
origin,  then  the  system  can  execute  a  motion  guaranteed  to  reduce  its  distance  from 
the  origin.  The  sensor  values  are  shown  in  a  polar  coordinate  representation  (r, 0) 
relative  to  the  actual  position  of  the  system.  For  each  r  there  is  a  maximum  angle 
0T  for  which  the  sensor  value  lies  far  enough  from  the  origin.  This  means  that  the 
sensor  value  p*(r, 0)  lies  at  least  distance  d  from  the  origin  whenever  |0|  <  0T. 


this  velocity  by  an  argument  similar  to  the  one  used  to  establish  (5.21).  Thus 


(5.22) 


5.4.1  Expected  Progress 

Since  the  problem  is  radially  symmetric,  we  can  assume  that  the  actual  position  lies 
on  the  x-axis  at  the  point  (a,0),  with  a  >  0.  The  sensed  value  p*  lies  (with  probability 
0.989)  in  a  circle  of  radius  e9,  centered  at  (a,0). 


L 
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Expected  Change  in  Position 

Let  us  compute  the  expected  change  in  position  assuming  that  the  sensed  value  lies 
far  enough  away  from  the  origin  that  the  strategy  can  execute  a  motion  guaranteed 
to  reduce  the  distance  to  the  origin.  Figure  5.5  indicates  the  portion  of  the  sensing 
error  ball  for  which  the  sensed  values  lie  far  enough  from  the  origin.  By  symmetry 
of  the  sensing  error  ball  about  the  x-axis  and  by  symmetry  of  the  velocity  error  ball, 
it  is  clear  that  the  expected  change  in  the  {/-coordinate  of  the  position  (a,  0)  is  zero. 
For  this  reason  we  will  focus  simply  on  the  expected  change  in  the  x-coordinate. 
Furthermore,  since  the  velocity  error  is  assumed  to  be  unbiased  as  well  as  symmetric, 
in  taking  expectations  we  can  simply  average  over  the  x-coordinate  of  all  commanded 
velocities.  The  averaging  here  is  done  with  respect  to  the  distribution  of  the  possible 
commanded  velocities,  that  is,  with  respect  to  the  distribution  of  observable  sensor 
values. 

In  order  to  compute  the  expected  change  in  position,  let  us  notice  that  for  each 
observed  sensor  value,  the  system  either  executes  a  random  motion  or  a  deterministic 
motion,  depending  on  the  distance  of  the  sensed  value  from  the  origin.  If  the  observed 
sensor  value  lies  far  enough  from  the  origin,  then  the  expected  change  in  position  is 
simply  the  commanded  velocity  times  the  duration  of  the  motion  At.  We  can  integrate 
these  commanded  velocities  over  all  possible  sensor  values  that  lie  far  enough  from  the 
origin,  weighting  the  integrand  by  the  density  function  that  describes  the  sensor  error. 
The  resulting  quantity  is  the  expected  velocity  of  the  system  due  to  non-randomizing 
motions.  Let  us  define  x(a)  to  be  the  x-component  of  this  integral.  Said  differently, 
x(a)  is  the  expected  instantaneous  change  in  the  x  position  given  that  the  starting 
position  is  at  (a,  0)  and  that  the  distance  of  the  sensed  value  from  the  origin  is  greater 
than  e,/^l  —  e2,  times  the  probability  of  actually  obtaining  a  sensor  value  that  far 
from  the  origin. 

We  can  write  each  possible  sensor  value  p*  in  polar  form  relative  to  the  actual 
position  (a,0).  Specifically,  p’(r,0)  =  (xf’-,#),  y(r,  0)),  where  x(r,0)  =  a  +  rcos# 
and  y(r,  0)  =  rsinfl.  We  will  denote  by  r,  9)  the  velocity  command  issued  when 
the  sensed  value  is  p*(r,  9).  Observe  also  that  a  sensor  value  p*(r,0)  is  located  at  a 
distance  k  —  k(r,0)  from  the  origin,  where 

(5.23)  k(r,  0)  =  Va2  +  r2  4-  2ar  cos  0. 

Given  the  range  of  possible  sensor  values  B(l(a,0)  when  the  system  is  at  the 
point  (a.0),  and  given  a  disk  Bj(0, 0)  of  radius  d  centered  at  the  origin,  consider  the 
set  of  sensor  values  in  the  set  difference  B(l(a, 0)  —  B,*(0,0).  These  are  the  set  of 
possible  sensor  values  that  are  at  least  distance  d  away  from  the  origin.  If  we  take  d 
to  be  tt/yj  1  —  t l,  then  this  set  consists  of  those  sensor  values  for  which  the  strategy 
can  safely  moves  towards  the  origin,  that  is,  for  which  the  strategy  can  execute  a 
motion  guaranteed  to  reduce  the  system’s  distance  from  the  origin,  independent  of 
the  system’s  actual  location  within  the  sensing  error  ball  i?f,(a,0). 

Now  consider  the  ring  of  sensor  values  at  a  fixed  distance  r  from  the  point  (a,0). 
For  some,  possibly  null,  range  of  angles  (—  0r,  0r),  the  sensor  values  p*(r,0)  lie  at  least 
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distance  d  from  the  origin.  See  figure  5.5.  First  let  us  determine  the  range  of  radii 
r  for  which  this  range  is  non-empty,  then  let  us  find  an  explicit  expression  for  0r. 
Clearly,  if  the  sensing  ball  about  (a,  0)  lies  inside  the  disk  of  radius  d ,  then  no  sensor 
value  lies  at  least  distance  d  from  the  origin.  Similarly,  if  the  actual  location  (a,  0) 
lies  at  least  distance  d  from  the  origin,  then  there  will  be  sensor  values  at  all  possible 
radii  r  that  lie  at  least  distance  d  from  the  origin.  Thus  the  set  of  radii 

for  which  the  interval  (— 6r,0r)  is  non-empty  is  given  by: 

'  0,  if  a  +  ea  <  d\ 

(5.24)  [rmin)  T’rnax]  =  '  if  <2  >  d] 

[d  —  a,  £,],  otherwise. 

For  a  given  r  £  rm.v],  the  angular  endpoint  0T  is  given  by: 

1 7T,  if  a  —  r  <  —d  or  a  —  r  >  d; 

r  —  1  otherwise  (assuming  a  +  r  >  d). 

The  cos-1  function  is  taken  to  have  values  in  the  range  [0, 7r] . 

The  reason  for  representing  the  sensor  values  in  terms  of  polar  coordinates  relative 
to  the  actual  location  of  the  system  is  that  the  probability  density  function  for  the 
possible  sensor  values  has  a  simple  form  in  polar  coordinates.  Specifically,  the  density 
function  in  polar  coordinates  corresponding  to  an  unbiased  two-dimensional  normal 
variate  with  variance  a 2  is  given  bv: 

(5.25)  p{r,6)  =  - -L-expj-^},  0  <  r  <  oo. 

As  one  would  expect,  the  density  function  is  uniform  in  6,  that  is,  it  is  constant 
for  constant  r.  Although  the  function  is  defined  for  all  non-negative  r,  we  will  only 
consider  r  £  [0, c,],  where  t,  =  3 a.  Over  this  reduced  range  p(r, 6)  is  no  longer  a 
density  function.  However,  if  the  sensor  values  are  indeed  constrained  to  this  finite 
error  ball,  then  one  can  regain  a  density  function  simply  by  dividing  by  approximately 
0.989  throughout. 

The  expected  instantaneous  displacement  in  the  i-direction  is  thus  given  by: 

frm*n 

(5.26)  x(a)  =  /  /  vx(r,0)  p(r,0)d0dr, 

Jrmil  J-8r 

where  v(r,0)  =  (vx,vv)  is  the  commanded  velocity  determined  by  p’(r,0). 

Let  us  expand  this  formula  slightly  for  the  case  dt  —  1.  One  can  simply  divide 
by  dt  in  the  general  case.  Observe  that  the  x-component  vx{r,0)  of  the  commanded 
velocity  is  of  the  form: 

-(k--  — ^  r(r’g) 

\  k  1  - e2/  Ip*M)I‘ 

Let  us  focus  on  the  inner  integral;  call  it  xr.  Then  we  see  that 
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f6r  lie  1  £‘  )  df) 

U  \  kl-el)  |p*(r,0)| 

9  f8r  _  i  £s  )  *M)  M 
Jo  V  kl-el)  |p*(r,0)j 

=  2  f*r-(k- i-iL)  a-±T^d6 

Jo  \  k  1  -  e2J  k 

-  2C-{l-h-rs)(“+rcosll)ie 

_  f6r  (  1  €* 

Jo  \  (a2  +  r2  +  2arcos0)  (1  —  e2) 


( a  +  rcos0)dd 


f6r  a  +  r  cos  6 


e 4  /■»>• 

-  2r*!/»  : 


2  +  r2  +  2ar  cos  ^ 


fSr 

—  2  /  (a  +  rcos0)dd 
Jo 


r  a  ■  ,  ■> 

oT  oigii(a_  —  r  ) 

2  a  a 


tan-1 


—  2  [a  0r  +  r  sin  0T\ . 


Observe  that  when  a  is  large,  that  is,  when  the  system  is  located  far  from  the 
origin,  the  significant  term  in  the  expression  for  xT  is  —2 aOT.  Similarly,  when  a  is 
small,  although  the  terms  proportional  to  1/a  now  become  significant,  they  tend  to 
be  of  equal  magnitude  but  opposite  sign.  Furthermore,  for  the  permissible  range  of  r 
given  by  equation  (5.24),  since  xT  is  negative  by  construction,  these  two  terms  tend 
to  be  canceled  by  the  term  —  2rsin0r.  Thus  again  the  term  proportional  to  a  seems 
to  be  the  significant  term.  In  short,  we  see  that  the  sensor  essentially  acts  almost 
like  a  spring,  pulling  the  system  towards  the  origin  in  near  proportion  to  its  distance 
from  the  origin.  This  is  not  completely  correct,  but  it  will  suffice  as  a  qualitative 
description. 

Finally,  the  expected  drift  in  the  x-direction  is  determined  by  integrating  over  the 
allowable  radii  r,  that  is: 


1  rrmu 

x(a)  =  - — -  /  i,  r  e‘i?  dr. 

LltO*  Jrmi „ 


This  integral  does  not  admit  to  a  nice  explicit  description.  Instead,  we  will  consider 
some  numerical  examples  later  on.  The  important  observation  is  that  the  sensor 
essentially  acts  like  a  spring.  As  we  mentioned  in  section  5.1.4,  pure  Brownian  motion 
tends  to  push  a  system  away  from  the  origin.  The  question  then  is  whether  the 
pull  of  the  sensor  towards  the  goal  is  strong  enough  to  overcome  the  natural  push 
outward  due  to  random  motions.  Recall  that  the  random  motions  are  required  since 
the  system  does  not  know  what  the  error  distributions  are,  but  nonetheless  should 
guarantee  eventual  convergence. 
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Qualitatively  speaking,  the  inward  pull  due  to  sensing  is  proportional  to  the 
distance  from  the  origin,  while  the  outward  push  due  to  randomization  is  inversely 
proportional  to  the  distance  from  the  origin.  Thus  there  will  be  a  range  for  which  the 
sensor  dominates,  and  the  system  moves  towards  the  origin  on  average.  However,  as 
the  system  approaches  close  to  the  origin,  eventually  the  randomization  will  dominate, 
and  the  system  will  move  away  from  the  origin  on  average.  At  the  boundary  between 
these  two  modes  of  behavior,  the  system  moves  neither  inward  nor  outward,  on 
average.  If  the  goal  is  large  enough,  the  system  will  be  sucked  into  the  goal  in 
an  almost  deterministic  fashion.  This  was  the  gist  of  our  discussion  on  local  drift. 
However,  if  the  goal  is  too  small,  then  the  convergence  time  will  become  quadratic  or 
worse,  as  the  strategy  must  rely  primarily  on  random  motions  rather  than  on  useful 
sensor  readings  to  attain  the  goal. 

Let  us  define  p(a )  to  be  the  probability  of  obtaining  a  useful  sensor  reading 
whenever  the  system  is  at  location  (a,0).  A  useful  sensor  reading  is  one  for  which 
the  system  can  execute  a  motion  guaranteed  to  reduce  its  distance  from  the  origin. 
Clearly 


rrm*K  fOr 

p(a)  =  /  p(r,9)d0dr 

A-min  J-8  r 

1  frM>  r2 

=  - -  /  9Tre~^zdr. 

JT<72  Jrm i„ 

Suppose  that  the  system  is  at  location  (a,0)  and  obtains  a  useful  sensor  reading 
p’.  Assume  that  the  system  executes  a  motion  determined  by  equation  (5.22)  for  time 
At.  Given  this  information,  the  discussion  above  says  that  the  expected  position  after 
execution  of  the  motion,  weighted  by  the  probability  of  actually  obtaining  a  useful 
sensor  reading,  is  given  by: 


In  other  words, 


(a  + 


s(°) 

dt 


Af,0). 


£[AA’|useful  sensor  reading] p(a) 


x(a )  At 
dt 


£[AK|useful  sensor  reading] p(a)  =  0. 


Variance  of  Positional  Change 

Let  us  also  compute  the  variance  of  the  change  in  each  coordinate,  assuming  that 
the  sensor  provides  a  useful  reading.  These  quantities  will  enable  us  to  compute  the 
infinitesimal  drift  and  variance  in  our  diffusion  approximation  to  the  sensing-guessing 
strategy. 
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First,  let  us  suppose  that  the  commanded  velocity  is  v,  and  let  us  compute  the 
expectations  £[(A.X)2|v]  and  £[(AK)2|v].  Assume  that  the  velocity  is  executed  for 
time  At  and  that  the  velocity  error  is  a  two-dimensional  normal  variate  with  standard 
deviation  <rv  |v|.  We  are  assuming  as  before  that  av  =  1 6„,  and  that  the  position  error 
after  execution  of  a  motion  for  time  At  is  simply  the  velocity  error  scaled  by  At.  See 
the  discussion  of  section  5.2. 

Recall  that  in  general,  if  Z  is  a  random  variable,  then  E[Z 2]  =  VAR[Z]  +  F[Z]2, 
where  V  AR[Z)  is  the  variance  of  Z.  This  is  basically  just  the  definition  of  the  variance 
of  a  random  variable.  In  the  following  expressions,  the  commanded  velocity  is  of  the 
form  v  =  (vx,vv).  We  thus  have  that 


Similarly, 


£[(AX)2|vj  =  VAR[AX|v]  +  £[AA”|v]2 
=  (At)2  a2  |vj2  4-  (At  vx)2 
=  (At)2  (|v|2  <t2  +  u2). 


£[(AF)2!v]  =  (At)2  (|vj2<72  +  v2). 

Recall  the  expression  (5.21)  for  tnMUC(fc).  Thinking  of  A:  as  a  function  of  r  and  6 
as  given  by  equation  ^5. '_'!),  can  write  the  magnitude  of  the  commanded  velocity 
corresponding  to  the  sensed  value  p*(r,0)  as  |v(r,0)|  =  tmax(r,0)/dt.  Now  let  us 
average  over  all  possible  sensor  values  and  associated  commands.  Then 


£((AX)2|useful  sensor  reading] p(a)  =  f  f  £[(AA)2|v(r,0)]p(r,0)d0dr 

=  Jdiy  °v  JrmiB  LeJtmAx{r'0^  p(^^dedr 


+ 


(At)2 


/r  run  [Or 
min  ''—Or 


Wr,0)]2[*(r,t?)]2 


(dt)2  J rmi„  J-Br  |p*|2 

We  can,  for  appropriate  definitions  of  1(a)  and  Ir(a),  write  this  as 

(5.27)  £[(AA')2|useful  sensor  reading]  p(a)  =  a2 1(a)  -I-  Ix(a). 
Similarly, 


p(r,  6)d9dr. 


(5.28)  E[(AY )2|useful  sensor  reading]  p(a)  =  cr2 1(a)  +  /„(a). 
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The  important  observation  is  that  these  expectations  are  proportional  to  (At)2. 
This  means  that  if  we  pass  to  a  diffusion  approximation,  the  infinitesimal  variances 
will  be  zero.  This  is  because  one  divides  by  At  in  computing  the  infinitesimal 
parameters,  then  allows  At  to  approach  zero.  The  fact  that  the  infinitesimal  variances 
approach  zero  means  that  the  portion  of  the  sensing-guessing  strategy  that  results 
from  useful  sensor  values  is  essentially  a  deterministic  process.  This  is  due  to  our 
assumption  that  the  velocity  error  scales  with  At,  rather  than  with  y/~At  (see  the 
discussion  in  section  5.2).  If  instead  we  assumed  that  the  velocity  error  was  due  to 
white  noise,  then  it  would  scale  with  a/A t.  In  that  case  the  expressions  (5.27)  and 
(5.28)  above  would  be  slightly  different.  Specifically,  the  coefficient  of  1(a)  would  now 
be  proportional  to  At  rather  than  (At)2.  In  passing  to  a  diffusion  approximation, 
this  says  that  the  infinitesimal  variance  contains  a  term  proportional  to  1(a).  It  is 
straightforward  to  perform  the  inner  integral  with  respect  to  6  in  the  definition  of 
1(a).  Again,  the  outer  integral  with  respect  to  r  has  no  explicit  representation.  We 
will  not  perform  the  integration  here,  but  simply  mention  that  the  integral  contains 
a  term  proportional  to  a2,  as  one  would  expect. 

Infinitesimal  Parameters  of  an  Approximating  Diffusion  Process 

Having  determined  the  expectation  and  variance  of  the  change  in  position  given  that 
the  system  obtains  a  useful  sensor  reading,  iet  us  now  compute  these  quantities  in 
the  general  case,  that  is,  for  arbitrary  sensor  readings,  assuming  that  the  system  is 
at  location  (a,0)  and  has  just  taken  a  sensor  reading.  Recall  that  the  variance  of 
the  Brownian  motion  is  <j2b.  Recall  further  that  p(a)  is  the  probability  of  obtaining  a 
useful  sensor  reading  when  the  system  is  at  the  location  (a,  0). 


(5.29) 

E[  AX]  = 

^[AA'Iuseful  sensor  reading]  p(a) 

4-  £'(AA'|Brownian  motion]  (1  —  p(a)). 

(5.30) 

So 

E[AX]  = 

x(a)  . 

-r1  At. 
dt 

(5.31) 

E[AY]  = 

0. 

Similarly, 

£[(AX)2] 

II 

&  C> 
^  *"*■ 

|<72 1(a)  4-  /x(a)]  4-  ( 1  -  p(a))  [  At  <t2b  +  ox(  At)] 

£[(AF)2] 

(At)2  | 
(dt)2  1 

CT2  /(a)  +  4(a)]  +  (1  -  p(a))  [a ta2B  +  oy(At)] 

where  oz(At)  and  oa(At)  contain  terms  of  order  less  than  At.  It  follows  that  the 
infinitesimal  drift  and  variance  of  an  approximating  diffusion  process  derived  from 
the  sensing-guessing  strategy  are  given  by: 
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/*(M) 

<^(0,0) 


<T2G(a)  0 

.  0  <4(a)  ’ 


where 


aU°)  =  (l  -P(a))a\. 

We  should  also  verify  that  the  higher-order  infinitesimal  moments  vanish,  but  this 
follows  in  a  straightforward  manner  from  the  results  for  Brownian  motions. 

A  Radial  Process 

The  behavior  of  the  sensing-guessing  strategy  is  radially  symmetric,  since  we  have 
assumed  that  the  sensing  and  control  errors  are  symmetric.  We  can  thus  think 
of  the  strategy  as  a  one-dimensional  process  on  the  positive  real  line.  We  will 
approximate  the  actual  sensing-guessing  strategy  by  a  diffusion  process.  Specifically, 
define  D(t)  =  ^X2{t)  +  Y2{t),  where  (X(t),Y(t))  is  the  position  of  the  system  at 
time  t.  Then  D(t)  is  the  distance  from  the  origin  at  time  t.  In  determining  the 
infinitesimal  parameters  of  D(t)  we  will  use  an  argument  very  similar  to  the  one  used 
to  establish  the  infinitesimal  parameters  of  the  Bessel  process  (see  section  5.1.4  and 
[KT2]). 

Define,  first  of  all,  Z(t)  =  A'(t)2+K(f)2.  So  D(t)  —  \JZ(i).  As  usual,  we  shall  write 
X(t  +  At)  =  x  +  AX.  where  x  =  X(t)  is  given  at  time  t,  and  At  is  the  time  between 
sensing  operations.  Thus  AX  is  a  random  variable.  A  similar  notation  is  used  for  Y 
and  Z.  AX  and  AY  are  independent  random  variables.  Modulo  terms  of  order  less 
than  At,  both  of  these  random  variables  have  essentially  normal  distributions,  with 
variance  AtaG.  Given  that  (x,  y)  =  (a,0),  E[AX]  and  E[AY)  are  given  by  (5.30)  and 
(5.31)  above.  Observe  that 


AZ  =  X2(t  +  At)  +  Y2{t  +  At)  —  x2  -  y2 

=  x2  +  2x  AX  +  (AX)2  +  y2  +  2y  AY  +  {AY)2  -  x2  -  y2 
=  [(AX)2  +  (Ar)2]  +2  [xAX  +  yAr], 

By  symmetry,  we  can  assume  without  loss  of  generality  that  (x,y)  =  (a,0).  Thus 


E[AZ] 


£((AX)2]  -I-  £[(AK)2]  +  2  x  £[AX]  +  2  y  £[AF] 
At  (Tq  +  or(A<)  fAf^c-;  cy[A i)  I  2  a  At, 

at 
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where  ox(At)  and  ov(At )  contain  terms  of  order  less  than  At. 

This  tells  us  that  the  infinitesimal  drift  for  the  process  Z  is 

Mz(<*2)  =  2a~jp  +  2  °g- 

Let  us  now  compute  the  terms  for  the  infinitesimal  variance  of  A Z.  The 
computation  makes  use  of  the  fact  that  the  higher  order  infinitesimal  moments  for 
the  processes  X  and  Y  vanish,  and  the  fact  that  AX  and  AY  are  independent. 


E[(AZ )2] 


£[( 2 1  AX  +  (AX)2  +  2  y  AY  +  (AK)2)2] 

E[4x\AX)2  +  (AX)4  +  4y2(AV')2  +  (AK)4  +  4x(AX)3  +  4y(AY)3 
+  8xyAXAY  +  4xAX(AF)2  +  4yAY(AX)2  +  2(AX)2(AT)2] 

4  £’[x2(AX)2  +  y2(AK)2]  +  o( A<) 

4  (Tq  z  At  +  o(At), 


where  o(At )  contains  terms  of  order  less  than  At,  that  is,  terms  proportional  to  (A  t)p. 
with  p  >  1.  We  see  then  that  the  infinitesimal  variance  of  Z  is  given  by 


4(a2)  =  4<72a2. 

Furthermore,  one  can  argue  that  the  higher-order  infinitesimal  moments  vanish,  since 
they  vanish  for  the  underlying  Brownian  motion  processes. 

In  order  to  determine  the  infinitesimal  parameters  of  the  process  D ,  we  will  use 
equations  (5.10)  and  (5.11)  from  page  229,  with  <?(z)  =  y/z.  Thus 


(5.32) 


(5.33) 


1  2 
-  a% 

2  2 
1 

t  4 


(-??)+*  (I :) 


2  a  x(a)  (1  /dt)  +  2  a 
2a 


2 

G 


X(a )  ,  t2  1 
dt  +  c2a 

j(q)  (1  -  p(o))gfl 
dt  2  a 


This  expression  says  that  the  infinitesimal  drift  consists  of  two  terms,  one  pulling  the 
system  towards  the  origin,  the  other  pushing  it  away.  The  inward  pull  is  due  to  the 
sensor,  while  the  outward  drift  is  due  to  randomization.  This  outward  pull  arises  in 
the  same  manner  as  it  did  for  the  Bessel  process. 

Next,  observe  that 
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In  other  words,  the  infinitesimal  variance  is  the  same  for  the  radial  process  as  it 
is  coordinate- wise  for  the  two-dimensional  representation  of  the  sensing-guessing 
strategy. 

Finally,  let  us  observe  that  if  the  error  arising  from  motions  commanded  in 
response  to  useful  sensor  values  is  non-vanishing  in  the  limit  as  At  goes  to  zero,  then 
one  must  add  another  term  to  the  expression  for  This  term  is  proportional  to 
the  integral  1(a).  The  term  carries  over  to  the  expressions  for  nD  and  <Tp,  effectively 
adding  another  outward  pull  to  the  radial  drift.  This  outward  pull  is  essentially 
proportional  to  the  distance  from  the  origin.  Intuitively  it  arises  from  command  errors 
in  much  the  same  way  that  an  outward  drift  arises  from  purely  random  motions. 

The  important  observation  to  take  from  (5.33)  is  that  the  two  terms  have  opposite 
sign.  Thus,  there  is  some  point  a0  for  which  p£>(ao)  =  0.  If  a  >  a0,  then  the  net 
radial  drift  is  negative,  meaning  that  on  average  the  system  is  moving  towards  the 
origin.  Conversely,  if  a  <  a0,  then  the  drift  is  positive,  meaning  that  on  average 
the  system  is  moving  away  from  the  origin.  This  says  that  if  the  goal  radius  r  is 
bigger  than  a0,  then  the  system  behaves  almost  like  a  deterministic  process,  moving 
towards  the  goal  with  expected  approach  velocity  greater  than  or  equal  to  /r£>(r). 
Thus  the  expected  convergence  time  is  essentially  bounded  by  —a,/ fi£>(r),  where  a, 
is  the  starting  location  of  the  system.  On  the  other  hand,  if  r  <  a0,  then  the  system 
will  act  very  similar  to  a  Brownian  motion  process,  randomly  walking  about  inside 
the  annulus  r  <  a  <  a0  until  the  goal  is  attained.  The  convergence  times  now  become 
slightly  worse  than  quadratic  in  the  distance  from  the  origin. 


5.4.2  An  Example 

Solving  for  do  is  in  general  a  difficult  task,  since  the  expression  (5.33)  involves  several 
integrals  that  have  no  explicit  analytic  description.  We  will  therefore  consider  a  simple 
numerical  example.  Suppose  that  the  error  parameters  are  given  as  follows. 

Sensing  Error:  t,  =  7 

Velocity  Error:  e„  =  0.5 


Brownian  Motion  Variance: 


1.0 
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Figure  5.6:  Effective  radial  drift  for  the  sensing-guessing  strategy. 


Figure  5.7:  Component  drifts  for  the  sensing-guessing  strategy,  along  with  the 
probability  of  obtaining  a  useful  sensor  reading. 
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Figures  5.6  and  5.7  indicate  the  resulting  radial  drift.  Figure  5.7  resolves  this  drift 
into  the  inward  pull  due  to  the  sensor  and  the  outward  push  due  to  randomization. 
The  figure  also  shows  how  the  probability  of  obtaining  a  useful  sensor  reading  increases 
with  the  distance  from  the  goal.  The  figures  indicate  that  the  value  a0  at  which  the 
drift  switches  from  negative  to  positive  is  around  a  =  3.0.  It  is  interesting  to  note 
that  the  value  of  clq  is  considerably  less  than  e5  for  this  example.  In  order  for  a  sensed 
value  to  be  useful  it  has  to  lie  outside  the  circle  of  radius  d  =  t,/y 1— ej.  In  this 
example  d  ss  8.1.  In  order  to  guarantee  that  a  sensed  value  will  lie  far  enough  from 
the  origin,  one  would  have  to  insist  that  the  system  be  at  least  distance  d  +  e,  ss  15.1 
from  the  origin.  Thus  a  strategy  that  wished  to  guarantee  entry  into  the  goal  in  a 
fixed  number  of  motions  could  do  so  only  if  the  radius  of  the  goal  was  at  least  15.1. 
However,  a  randomized  strategy  can  guarantee  eventual  entry.  Indeed,  for  the  nice 
Gaussian  sensor  distribution  that  we  have  assumed,  sufficiently  many  sensor  values  lie 
outside  the  circle  of  radius  d.  that  the  expected  approach  velocity  points  towards  the 
origin  whenever  the  system  is  at  least  distance  a  as  3  from  the  origin.  The  difference 
between  d  +  es  15.1  and  a  «  3  shows  quite  dramatically  how  a  randomized  strategy 
can  extend  the  convergence  region  of  a  goal  beyond  that  provided  by  a  bounded-step 
guaranteed  strategy. 

We  should  add  our  usual  caveat  to  these  observations.  The  strategy  could  be 
considerably  improved  for  the  particular  pair  of  sensing  and  control  errors  assumed 
in  the  analysis  above.  For  instance,  by  always  assuming  that  the  sensor  value  p*  is 
correct,  and  issuing  a  commanded  velocity  of  the  form  v  =  — pm/dt ,  the  expected 
approach  velocity  could  be  made  to  point  towards  the  goal  for  all  positions  of  a, 
not  just  for  a  >  3.  This  is  because  the  sensing  error  has  no  bias.  However,  as  we 
have  stated  before,  the  strategy  was  designed  to  succeed  for  all  error  distributions 
consistent  with  the  bounds  t,  and  et,,  not  just  unbiased  Gaussian  errors.  A  strategy 
that  always  interpreted  the  current  sensed  value  as  correct  could  easily  converge  to 
the  wrong  location.  This  difficulty  was  demonstrated  in  figure  2.7  for  a  sensor  with 
a  fixed  but  unknown  bias.  Thus  we  have  employed  a  strategy  that  is  suboptimal  in 
the  presence  of  unbiased  Gaussian  errors,  but  that  still  converges  reasonably  quickly, 
and  more  importantly,  that  converges  for  all  possible  error  distributions. 

Convergence  Times 

Let  us  examine  the  expected  convergence  times  for  the  current  example.  In  section 
5.1.2  we  discussed  a  differential  equation  that  models  the  expected  convergence  time 
of  a  diffusion  process.  We  can  solve  this  equation  numerically  to  obtain  estimates  for 
the  convergence  times  of  the  sensing-guessing  strategy  for  various  goal  radii. 

Figure  5.8  displays  the  numerical  solution  to  the  differential  equation  (5.6), 
assuming  that  the  goal  is  located  at  a  =  5,  and  that  the  system  reflects  at  a  =  12. 
The  expected  times  to  reach  the  goal  seem  to  satisfy  a  downward-opening  quadratic. 
This  is  not  surprising,  given  the  spring-like  behavior  of  the  sensor.  Af.er  all,  the 
expected  approach  velocity  at  a  given  point  is  almost  proportional  to  the  distance 
from  the  origin.  For  these  examples,  the  maximum  cap  on  velocity  magnitude  was 
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tine 


Expected  convergence  t tries  to  reach 
Boat  at  5,  starting  fron  'a'. 


Figure  5.8:  Expected  times  to  reach  a  goal  of  radius  5  from  different  starting  locations, 
for  the  sensing-guessing  strategy. 
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tine 


Expected  convergence  tines  to  reach 
goal  at  X,  starting  fron  'a'. 


Figure  5.9:  Expected  times  to  reach  a  goal  of  radius  1  from  different  starting  locations, 
for  the  sensing-guessing  strategy. 


indirectly  given  by  using  dt  =  0.1.  So,  at  a  distance  of  a  units  from  the  origin,  the 
expected  approach  velocity  is  no  greater  than  a/dt  units  per  second,  that  is.  10  a.  In 
fact,  it  is  often  much  less  because  not  all  sensor  values  provide  a  useful  sensor  reading. 
For  instance  at  a  =  8,  the  expected  approach  velocity  is  approximately  —17.3  (see 
figure  5.6),  whereas  at  a  =  12  it  is  about  —66.1. 

The  quadratic  nature  of  the  convergence  times  may  seem  to  contradict  the  claim 
that  the  convergence  times  are  linear  in  the  distance  from  the  origin.  In  fact  there 
is  no  such  contradiction,  since  the  linearity  claim  is  simply  an  upper  bound  on 
the  convergence  times.  Since  the  expected  approach  velocity  does  increase  with 
the  distance  from  the  origin,  one  would  expect  the  actual  convergence  times  to  be 
considerably  less  than  the  predictions  made  by  the  linearity  bound.  Indeed,  if  one 
erected  a  line  tangent  to  the  curve  of  figure  5.8  at  the  point  a  =  5.0,  this  line  would 
represent  the  linear  upper  bound.  The  downward-opening  nature  of  the  curve  reflects 
the  fact  that  the  actual  performance  is  considerably  better. 

A  visually  more  convincing  argument  is  made  by  considering  the  convergence 
times  for  a  goal  radius  r  that  lies  inside  the  radius  a0.  Recall  that  a0  is  the  location 
at  which  the  expected  approach  velocity  switches  sign.  In  some  sense  a0  represents 
an  attraction  point,  since  locally  the  expected  infinitesimal  velocity  points  towards 
a0.  Thus  if  a  goal  has  a  smaller  radius  than  ao,  then  convergence  is  guaranteed  by 
the  variance  of  the  Brownian  motion,  not  by  the  motions  suggested  by  the  sensor. 
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tine 


Expected  convergence  tines  to  reach 
goal  at  2.  starting  f ron  ‘a’. 


Figure  5.10:  Expected  times  to  reach  a  goal  of  radius  2  from  different  starting 
locations,  for  the  sensing-guessing  strategy. 


The  greater  the  variance  of  the  Brownian  motion,  the  faster  the  convergence.  For  the 
case,  r  =  1.0,  figure  5.9  shows  the  expected  convergence  times,  assuming  reflection 
at  a  =  8.  The  curve  is  again  similar  to  a  quadratic,  but  the  convergence  times  are 
one  to  two  orders  of  magnitude  greater  than  they  were  for  the  case  r  =  5.  Notice 
that  the  segment  from  a  =  8  to  a  =  5  appears  nearly  linear  with  respect  to  the  scale 
of  the  entire  curve  from  a  —  8  to  a  =  1.  In  other  words,  relative  to  the  scale  of  this 
problem,  where  the  goal  is  at  r  =  1,  the  convergence  times  of  the  previous  problem, 
where  the  goal  was  at  r  =  5,  are  indeed  nearly  linear. 

Finally,  figure  5.10  displays  the  convergence  times  for  another  problem  in  which 
r  <  a0.  In  this  case  r  =  2.  Again,  the  times  are  considerably  greater  than  for  the  case 
r  =  5  >  a0.  Furthermore,  comparing  this  figure  to  figure  5.9,  one  sees  how  dramatic 
is  the  difference  between  moving  from  a  =  3  to  o  =  2  and  moving  from  a  =  2  to 
a  =  1. 

5.4.3  Simulations 

We  tested  the  sensing-guessing  strategy  in  simulation.  The  results  agree  qualitatively 
with  those  obtained  from  the  analysis  above.  In  particular,  for  the  case  in  which  the 
goal  radius  is  5,  and  the  starting  location  is  at  the  point  (12,0),  the  average  time  to 
attain  the  goal,  averaged  over  1000  trials,  was  approximately  0.505.  The  maximum 
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and  minimum  times  to  attain  the  goal  were  0.039  and  2.64,  respectively,  and  the 
experimentally  obtained  standard  deviation  was  0.365.  The  numerical  results  from 
the  data  for  figure  5.8  suggested  an  expected  convergence  time  of  approximately  0.61 
in  this  case. 

Similarly,  for  the  case  r  =  1,  with  a  starting  location  at  (8,0),  the  average 
time  to  attain  the  goal  was  9.14,  with  a  standard  deviation  of  7.86.  The  minimum 
and  maximum  times  were  0.116  and  58.2.  These  statistics  were  also  obtained  from 
1000  trials.  The  numerical  results  from  the  data  for  figure  5.10  suggest  an  expected 
convergence  time  of  approximately  14  in  this  case. 

The  simulation  statistics  and  the  analytical/numerical  predictions  do  not  agree, 
except  in  terms  of  order  of  magnitude.  Part  of  this  is  due  to  the  fact  that  we 
assumed  a  pure  diffusion  process  for  the  analytical  results,  whereas  the  simulations 
were  implemented  as  discrete-time  processes,  with  a  time  step  that  was  on  the  order 
of  dt.  As  a  consequence,  the  variance  arising  from  command  errors  became  significant. 
Recall  that  we  assumed  that  the  variance  in  the  command  error  disappears  as  the 
time  step  approaches  zero.  A  larger  variance  implies  that  the  system  is  more  likely 
to  make  big  motions,  which  can  decrease  convergence  times.  Nonetheless,  as  a  first 
approximation  to  the  qualitative  behavior,  the  numerical  results  describe  the  sensing¬ 
guessing  strategy  reasonably  well.  Indeed,  upon  taking  At  =  dt/ 100,  there  was  a 
marked  improvement  in  the  results.  For  the  case  r  =  5,  the  average  over  1000  trials 
was  0.582.  For  the  case  r  =  1,  the  average  over  1000  trials  was  over  11. 

Biases 

If  we  add  biases  to  the  sensing  or  control  errors,  then  the  problem  is  no  longer 
symmetric.  In  particular,  the  infinitesimal  drift  and  variance  depend  not  only  on  the 
distance  from  the  origin  but  on  the  exact  location  p  =  (x,  y ).  The  differential  equation 
(5.6)  describing  the  expected  time  to  attain  the  goal  is  thus  a  two-dimensional  partial 
differential  equation.  Rather  than  solve  this  equation  explicitly  or  numerically,  let  us 
try  to  obtain  a  qualitative  description  of  the  behavior  of  the  system. 

We  will  focus  on  sensing  biases.  That  is  because  a  sensing  bias  can  radically 
change  the  convergence  properties  of  a  region  near  the  goal.  In  particular,  as  we 
shall  see,  a  point  in  state  space  may  change  from  a  point  at  which  sensing  is  good 
enough  to  move  the  system  towards  the  goal  on  the  average,  into  a  point  at  which 
only  randomization  is  possible.  While  velocity  biases  can  also  change  convergence 
properties,  the  feedback  strategy  of  this  chapter  was  designed  to  make  progress  for 
all  velocities  in  the  velocity  error  cone.  Thus  the  change  affected  by  a  velocity  bias 
manifests  itself  primarily  as  a  small  change  in  the  direction  (and  magnitude)  of  the 
infinitesimal  drift.  Locally  this  does  not  change  the  convergence  properties  of  points 
near  the  goal,  assuming  that  the  velocity  bias  is  small.  The  velocity  bias  clearly  may 
have  a  global  effect  since  changing  the  local  infinitesimal  drift  changes  the  natural 
paths  of  the  system.  The  analysis  of  such  global  changes  goes  beyond  the  scope  of 
this  thesis. 

It  may  be  useful  to  consider  again  the  example  of  section  2.4.  In  that  example  the 
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Figure  5.11:  This  figure  indicates  the  effect  of  a  sensing  bias  on  the  usefulness  of 
sensor  values.  A  sensor  value  is  useful  if  it  lies  outside  the  circle  of  radius  d.  In  each 
of  Part  A  and  Part  B  the  actual  state  of  the  system  is  at  the  point  p.  The  solid  circle 
about  p  indicates  the  range  of  possible  sensor  values  without  any  bias.  The  dashed 
circle  about  the  point  p  +  b  indicates  the  actual  range  of  sensor  values,  assuming  a 
bias  b.  In  Part  A  the  bias  increases  the  range  of  useful  sensor  values,  while  in  Part 
B  the  bias  decreases  the  range  of  useful  sensor  values. 


sensing  error  was  given  by  a  constant  bias.  The  effect  of  this  bias  was  to  facilitate 
goal  attainment  from  certain  approach  directions,  while  preventing  it  from  others.  If 
one  introduces  sensing  biases  into  the  simple  feedback  strategy  of  the  current  chapter, 
the  effect  is  similar.  Effectively  the  bias  shifts  the  sensing  uncertainty  ball.  For  some 
states  of  the  system  this  means  that  the  observed  sensor  values  are  shifted  away  from 
the  origin,  thus  increasing  the  likelihood  that  the  system  will  obtain  a  useful  sensor 
value.  For  other  states,  the  observed  sensor  values  are  shifted  towards  the  origin, 
thereby  preventing  the  system  from  knowing  in  which  direction  to  move. 

First,  imagine  that  the  system  is  unaware  of  a  bias  in  the  sensing  uncertainty. 
Instead,  the  simple  feedback  loop  operates  as  before  on  the  assumption  that  the 
sensing  error  ball  has  radius  e,  and  that  the  velocity  uncertainty  is  given  by  e„. 
Let  d  =  ij \J  1  -  e^.  Recall  that  this  means  that  whenever  the  system  observes 
a  sensor  value  that  lies  at  least  distance  d  from  the  origin,  then  it  will  execute  a 
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motion  guaranteed  to  make  progress  towards  the  origin.  If  the  sensed  value  lies  within 
distance  d  of  the  origin,  then  the  system  executes  a  randomizing  motion.  Now,  let  p 
be  the  actual  state  of  the  system  and  let  b  be  the  unknown  bias  in  the  sensor.  See 
figure  5.11.  If  there  were  no  bias,  the  range  of  possible  sensor  readings  would  be  given 
by  a  ball  of  radius  e,  centered  at  p,  that  is,  by  i?e,(p).  With  the  bias  b,  the  range  of 
possible  sensor  values  is  shifted  by  the  bias,  that  is,  it  is  given  by  the  ball  Bft( p  4-  b). 

In  short,  the  behavior  of  the  feedback  loop  at  the  point  p,  assuming  an  unknown 
bias  b,  is  the  same  as  it  would  have  been  at  the  point  p  +  b  for  a  feedback  loop  in 
which  there  is  no  sensing  bias.  In  particular,  the  local  infinitesimal  drift  at  the  point 
p  in  the  biased  case  is  the  same  as  it  would  be  at  the  point  p  +  b  in  the  unbiased 
case.  Suppose  that  p  and  b  are  actually  parallel,  as  in  figure  5.11.  Then  in  the 
biased  system  the  expected  velocity  of  approach  is  increased  at  the  point  p  whenever 
p  ■  b  >  0,  and  is  decreased  otherwise.  Thus  a  system  approaching  the  origin  from  a 
direction  on  the  opposite  side  of  the  origin  relative  to  the  bias  must  quickly  resort 
to  randomization.  If  the  bias  is  reasonably  small  relative  to  the  size  of  the  goal  then 
this  is  not  a  permanent  problem.  Eventually,  as  the  system  drifts  around  the  goal, 
the  sensing  bias  begins  to  facilitate  goal  approach,  and  the  system  is  again  able  to 
rely  on  sensing  to  make  progress  towards  the  goal.  (See  again  the  example  of  section 
2.4  that  deals  with  the  case  of  sensing  error  due  purely  to  a  fixed  but  unknown  bias.) 

Thus  far  we  have  assumed  that  the  system  is  unaware  of  the  existence  of  a  bias. 
If  in  fact  the  maximum  possible  magnitude  of  the  bias  b  is  known  to  the  system, 
but  not  the  actual  direction,  then  a  safe  strategy  is  to  augment  the  effective  sensing 
uncertainty  radius  from  es  to  ea  +  b,^.  This  increases  the  value  of  the  safe  distance 
d.  As  a  result,  the  range  of  useful  sensor  values  at  any  state  is  reduced.  This  means 
that  the  infinitesimal  drift  towards  the  origin  is  decreased  in  magnitude.  Indeed,  for 
some  states,  sensing  may  no  longer  be  of  any  use. 

In  summary,  we  see  that  a  sensing  bias  changes  the  convergence  properties  of 
points  near  the  goal.  In  particular,  there  are  preferred  directions  of  approach,  namely 
those  that  are  roughly  on  the  same  side  of  the  goal  as  the  direction  given  by  the 
sensing  bias.  If  the  sensing  bias  is  small,  then  the  system  can  safely  ignore  the  bias. 
If  the  bias  is  large,  then  its  maximum  magnitude  should  be  incorporated  into  the 
decision  loop  that  ensures  safe  progress  towards  the  goal. 


5.5  Summary 

This  chapter  analyzed  in  detail  a  simple  feedback  loop.  The  task  consisted  of  moving 
a  point  in  the  plane  into  a  circular  region  at  the  origin,  in  the  presence  of  control  and 
sensing  uncertainty.  Such  a  task  might  correspond  to  the  problem  of  inserting  a  peg 
into  a  hole  by  sliding  the  peg  on  a  surface  surrounding  the  hole.  The  strategy  was 
stated  without  assuming  any  particular  form  of  error  distribution.  Both  the  control 
and  sensing  uncertainty  were  merely  represented  as  bounded  error  balls. 

The  strategy  consisted  of  a  combination  of  sensor-based  motions  and  random 
motions.  Repeatedly,  the  system  would  sense  its  current  position,  then  execute  a 
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motion  for  a  short  duration  of  time.  Whenever  the  sensed  position  was  sufficiently  far 
from  the  origin,  the  system  would  execute  a  motion  guaranteed  to  reduce  its  distance 
from  the  origin  for  all  possible  interpretations  of  the  sensed  position.  Otherwise,  the 
system  would  execute  a  random  motion.  The  purpose  of  the  random  motion  was  to 
move  either  to  a  location  from  which  the  sensor  could  again  provide  useful  information 
or  to  attain  the  goal  fortuitously. 

The  randomized  strategy  was  formulated  to  succeed  independent  of  the  actual 
error  distributions,  so  long  as  these  distributions  satisfied  certain  bounds.  The 
randomizing  aspect  of  the  strategy  ensures  this  success.  The  convergence  time  of  the 
strategy,  however,  depends  intimately  on  the  actual  error  distributions.  The  strategy 
was  analyzed  for  a  particularly  nice  pair  of  error  distributions,  namely  unbiased 
Gaussian  errors  in  both  sensing  and  control.  This  analysis  involved  modelling  the 
behavior  of  the  strategy  as  a  diffusion  process.  The  resulting  diffusion  approximation 
determined  a  range  of  goal  radii  for  which  the  strategy  converged  quickly.  In  contrast, 
a  strategy  that  must  guarantee  entry  into  the  goal  within  a  fixed  number  of  steps 
would  consider  the  problem  unsolvable  for  many  of  these  goals,  namely  those  goals 
with  small  radii.  One  may  conclude  that  randomization  offers  a  reasonable  approach 
for  extending  the  class  of  solvable  tasks  beyond  those  considered  solvable  by  bounded- 
step  guaranteed  strategies. 
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Chapter  6 


Conclusions  and  Open  Questions 


6.1  Synopsis  and  Issues 

Randomization  and  Task  Solvability 

The  main  goal  of  this  thesis  has  been  to  demonstrate  how  randomized  strategies  can 
extend  the  class  of  tasks  considered  to  be  solvable.  The  basic  idea  is  to  place  a 
loop  around  a  set  of  strategies,  each  of  which  is  guaranteed  to  accomplish  a  task  if 
certain  preconditions  are  satisfied.  The  purpose  of  the  loop  is  to  repeatedly  choose 
and  execute  one  such  strategy,  in  the  hope  of  eventually  choosing  a  strategy  that  will 
actually  attain  the  goal.  In  making  its  choice,  the  system  executes  a  strategy  whose 
preconditions  are  satisfied,  should  the  system  ever  be  fortunate  enough  to  knowingly 
satisfy  the  preconditions  of  some  strategy.  Generally,  however,  the  preconditions  may 
be  too  stringent  to  be  satisfied  knowingly.  In  that  case,  the  system  randomly  selects 
a  strategy.  Eventually  the  system  will  guess  correctly  and  accomplish  its  task. 

Synthesizing  Randomized  Strategies 

The  thesis  developed  a  formalism  for  generating  guaranteed  plans  to  include 
randomizing  decisions  and  actions.  Of  particular  interest  were  tasks  for  which  there 
existed  strategies  that  would  locally  make  progress  on  the  average,  relative  to  some 
progress  measure  defined  on  the  system’s  state  space.  It  was  shown  that  any  strategy 
whose  behavior  may  be  modelled  as  a  Markov  chain  inherently  defines  a  progress 
measure  relative  to  which  it  makes  progress.  The  complementary  problem,  of  finding 
a  useful  progress  measure  for  a  given  task,  is  more  difficult.  Sometimes  distance 
provides  a  natural  progress  measure,  but  generally  a  strategy  will  only  make  progress 
on  some  subset  of  the  state  space  for  such  a  progress  measure.  An  interesting  question 
is  whether  it  is  possible  to  transform  a  task  description  into  a  progress  measure  from 
which  one  can  build  a  fast  randomized  strategy.  In  general  one  suspects  that  this 
problem  is  no  easier  than  the  problem  of  finding  guaranteed  strategies  or  optimal 
strategies.  However,  for  certain  classes  of  tasks  an  advantage  may  be  gained  by 
viewing  the  task  in  terms  of  progress  measures  and  nominal  plans.  This  is  an  open 
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area. 

More  Extensive  Knowledge  States 

Although  the  thesis  developed  the  randomizing  formalism  in  some  generality,  the 
specific  examples  considered  were  essentially  assembly  operations  involving  the 
attainment  of  two-dimensional  regions,  assuming  position  sensors  and  first-order 
dynamics.  An  interesting  project  would  be  to  include  force  information  in  the 
randomizing  decisions,  to  make  extended  use  of  history,  and  to  consider  more 
complicated  tasks.  A  related  question  is  whether  anything  is  to  be  gained  by  defining 
progress  measures  on  the  space  of  knowledge  states.  This  is  the  natural  setting  for 
such  measures  once  a  strategy  retains  history  in  making  decisions. 

Reducing  Brittleness 

One  view  of  randomized  strategies  is  that  they  provide  a  means  for  reducing  the 
sensitivity  of  a  task  solution  to  initial  conditions.  After  all,  the  whole  approach  is 
based  on  not  knowing  exactly  which  preconditions  are  satisfied.  This  view  may  be 
carried  further  to  include  other  parameters  of  the  system.  The  example  of  section 
2.4  showed  how  the  sensitivity  to  sensing  biases  could  be  avoided  by  executing 
randomizing  motions,  albeit  at  the  cost  of  increased  convergence  time.  Other 
parameters,  such  as  the  shape  and  location  of  objects  in  the  environment,  or  the 
specification  of  the  dynamics,  may  also  be  subject  to  uncertainty.  It  is  desirable 
to  construct  strategies  that  need  not  know  precisely  the  values  of  these  parameters. 
Donald’s  [Don89]  work  on  model  error  forms  a  natural  domain  in  which  to  explore 
randomizing  approaches  for  dealing  with  uncertainties  in  the  task  specification  itself. 
See  also  (KRj.  An  interesting  open  question  is  whether  it  is  possible  to  build  general 
strategies  from  simple  and  incomplete  task  descriptions.  Randomization  may  provide 
part  of  the  answer  via  its  ability  to  blur  the  sensitivity  to  detail. 


6.2  Applications 

Chapter  2  discussed  some  of  the  intended  applications  of  randomized  strategies.  The 
assembly  and  manipulation  of  objects,  mobile  robot  navigation,  and  the  design  of 
parts  and  sensors  are  broad  domains  of  applicability.  Let  us  now  relate  some  of  the 
results  of  the  thesis  to  these  domains. 

6.2.1  Assembly 

A  Formal  Framework  For  Existing  Applications 

Randomization  plays  an  important  role  in  assembly  operations.  Randomization 
appears  naturally  in  the  form  of  noise,  both  in  sensing  and  control.  Furthermore, 
it  is  sometimes  added  purposefully  in  the  execution  of  assembly  strategies.  Vibrating 


6.2.  APPLICATIONS 


269 


a  part  in  order  to  overcome  stiction  is  a  common  example.  Spiral  searches  to  locate 
some  feature,  while  implemented  deterministically,  are  similar  to  randomization  both 
in  their  intent,  namely  to  compensate  for  unknown  initial  conditions,  as  well  as  in  their 
execution,  due  to  control  uncertainty.  Finally,  vibratory  bowl  feeders  actively  make 
use  of  randomization  by  purposefully  tossing  an  improperly  oriented  part  back  into 
the  bottom  of  the  bowl.  The  intent  is  to  obtain  probabilistically  a  better  orientation 
of  the  part  on  its  next  pass  through  the  bowl’s  orienting  track. 

We  see  therefore  that  randomization  is  a  useful  tool  present  in  the  solution  of 
established  manipulation  and  assembly  tasks.  One  contribution  of  this  thesis  has 
been  to  provide  a  formal  basis  for  the  use  of  randomization.  In  particular,  the  thesis 
developed  a  framework  for  synthesizing  randomized  strategies.  Within  this  framework 
randomization  may  be  viewed  as  simply  another  operator,  along  with  the  operators 
of  sensing  and  action.  All  three  operators  are  essential  to  the  solution  of  general 
assembly  tasks. 

Utilizing  Available  Sensors 

One  of  the  themes  of  the  thesis  was  to  explore  the  conditions  under  which  progress 
towards  task  completion  is  possible  on  average.  We  implemented  a  peg-in-hole  task 
using  a  simple  camera  system  to  sense  position,  and  we  analyzed  the  convergence 
properties  of  a  simple  feedback  loop  with  a  position  sensor  subject  to  unbiased 
Gaussian  error.  In  both  cases  the  task  strategy  would  make  use  of  the  position 
sensor  when  the  sensor  provided  information  that  permitted  progress  towards  the 
goal,  and  otherwise  the  strategy  would  execute  a  random  motion.  This  combination 
of  sensing  and  randomization  allowed  the  task  to  be  solved  probabilistically  under 
conditions  for  wh;<“h  no  bounded-step  guaranteed  strategy  existed.  Not  only  did  the 
randomization  ensure  eventual  convergence,  but  for  a  wide  range  of  initial  conditions 
the  sensing  information  ensured  that  the  convergence  was  actually  rapid. 

The  moral  to  be  taken  from  these  examples  is  that  a  position  sensor  can 
provide  considerably  more  information  than  is  made  use  of  in  a  bounded-step 
guaranteed  strategy.  While  this  information  cannot  always  be  interpreted  correctly 
in  a  guaranteed  sense,  the  combination  of  randomization  and  sensing  can  in  many 
instances  naturally  sort  out  the  useful  from  the  useless  sensor  readings.  For  instance, 
by  randomizing  its  position,  a  system  can  compensate  for  unknown  sensing  biases, 
and  in  some  instances  naturally  position  itself  actually  to  take  advantage  of  the  biases. 

Using  Additional  Sensors 

Ultimately  one  should  explore  more  complex  sensors.  In  particular,  it  is  clear  that 
force  sensors  are  useful  in  disambiguating  contact  conditions.  [Sim79]  points  out 
that  the  extra  information  to  be  gained  from  position  sensors  by  using  probabilistic 
techniques,  such  as  Kalman  filters,  produces  estimates  with  the  same  order  of 
magnitude  in  precision  as  the  sensors  themselves.  In  contrast,  two  orders  of  magnitude 
of  improved  precision  are  usually  required  to  meet  standard  clearance  ratios  of 
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assembled  parts.  By  adding  force  sensors  one  can  enhance  greatly  the  net  sensing 
precision.  In  terms  of  randomized  strategies  and  simple  feedback  loops,  this  barrier 
to  the  improvement  in  the  precision  of  position  sensors  alone  makes  itself  visible  in 
the  direction  of  the  expected  motion  of  the  system.  Ultimately,  as  the  system  begins 
to  operate  below  the  resolution  of  its  sensors,  the  randomizing  aspect  of  the  strategy 
dominates  the  sensing  information,  and  the  system  drifts  away  from  the  goal  on 
average.  An  unexplored  question  is  how  the  addition  of  force  sensors  could  be  used 
to  improve  the  convergence  of  a  randomizing  feedback  loop. 

Eventual  Convergence  in  the  Context  of  Grasping 

Despite  the  advantage  of  better  sensors  in  terms  of  improved  precision,  the  sensors 
can  sometimes  be  difficult  to  interpret.  For  instance,  consider  a  multi-fingered  hand 
equipped  with  torque  sensors  at  each  of  several  tendon-controlling  motors.  A  set  of 
torque  readings  from  these  sensors  may  be  difficult  to  map  back  onto  an  interpretation 
in  the  world.  Fortunately,  better  sensors  are  not  required  in  a  strict  sense  to  ensure 
goal  attainment.  Randomization  ensures  eventual  goal  attainment.  [This  assumes 
that  the  randomization  is  so  chosen  as  to  cover  the  space  of  interest  in  finite  time,  and 
that  the  goal  is  recognizable].  Again,  the  point  is  that  a  randomized  strategy  makes 
use  of  sensing  information  when  it  can,  but  does  not  stop  cold  once  this  information 
ceases  to  be  useful.  This  is  an  important  property. 

In  the  multi-fingered  hand  example,  the  task  might  consist  of  grasping  a  part 
stably.  If  the  positions  of  the  fingers  relative  to  the  part  are  not  known  precisely,  or  if 
the  dynamic  properties  of  the  part  itself  are  not  known  precisely,  then  it  may  not  be 
possible  to  grasp  the  part  stably  on  a  first  attempt.  For  instance,  the  center  of  mass 
might  be  in  an  unexpected  location.  While  one  can  imagine  a  series  of  test  operations 
based  on  force  information  to  ascertain  the  dynamic  properties  of  the  part,  such  a 
battery  of  tests  may  not  be  feasible,  due  perhaps  to  a  lack  of  sensors  or  an  inability 
to  interpret  them.  If  this  is  the  case,  a  simpler  strategy  might  consist  of  grasping  the 
part  by  randomly  selecting  a  grasp  configuration  from  a  set  of  grasp  configurations, 
where  the  set  has  been  chosen  to  contain  the  desired  but  unknown  grasp.  Although 
the  robot  may  drop  the  part  a  few  times,  eventually  it  will  select  the  correct  grasp 
configuration,  and  the  task  will  be  accomplished. 

From  a  practical  point  of  view  this  discussion  suggests  that  one  need  not  rely 
heavily  on  complicated  sensors.  We  know  from  the  work  on  sensorless  manipulation 
(see  [Mas85])  that  task  mechanics  and  predictive  ability  can  often  be  used  to  solve 
tasks  well  below  the  resolution  of  available  sensors.  The  thesis  suggests  that  another 
approach  is  to  use  randomization. 

Some  Assembly  Tasks 

Some  other  classes  of  tasks  in  which  randomization  is  useful  include: 

•  Parts  Orienting.  Many  parts,  in  particular,  polyhedral  parts,  will  assume 
one  of  a  small  set  of  configurations  when  dropped  onto  a  tabletop  under  the 
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influence  of  gravity.  One  approach  for  orienting  a  part  is  to  drop  it  onto  a  table, 
then  perhaps  shake  the  table  or  the  part  until  the  part  winds  up  in  the  desired 
configuration.  The  advantage  of  this  approach  is  that  it  reduces  the  sensing  and 
manipulation  requirements  of  the  orienting  system.  Instead  of  being  required  to 
orient  the  part  from  a  possibly  arbitrary  configuration,  the  system  need  simply 
be  able  to  randomize  the  part’s  configuration  sufficiently  to  ensure  that  the 
desired  orientation  is  achieved.  Additionally,  the  system  need  simply  be  able 
to  recognize  the  part  in  its  goal  orientation.  The  disadvantage  of  this  approach 
is  that  it  may  require  a  long  time  to  succeed  if  the  desired  orientation  is  one 
that  occurs  infrequently  when  the  part  is  dropped.  More  work  is  required  on 
investigating  the  usefulness  of  this  approach.  Again,  we  mention  the  vibratory 
bowl  feeder  as  a  paradigm  similar  to  this  approach  for  orienting  parts.  [See 
[BRPM]  in  this  context.] 

In  terms  of  the  thesis  these  operations  correspond  to  the  nearly  sensorless  tasks 
discussed  in  section  3.13.  Sensing  is  used  mainly  to  signal  goal  attainment,  while 
randomization  is  used  to  ensure  eventual  convergence.  It  is  up  to  a  planning 
system  that  understands  the  mechanics  of  the  domain,  in  this  case  the  dynamics 
of  dropped  parts,  to  suggest  a  sufficient  set  of  randomizing  motions. 

•  Fine  Motions.  One  of  the  applications  of  randomization  is  in  the  final  phase  of 
a  complex  operation.  Generally  the  available  control  and  sensing  system  will  be 
good  enough  to  complete  the  gross  motion  operations  of  the  task,  but  the  fine 
motions  may  be  difficult  to  control  or  observe.  A  simple  example  in  the  human 
domain  is  given  by  the  task  of  opening  an  electric  car  window  to  a  desired 
width  in  order  to  adjust  the  airflow  to  the  rear  passengers  to  a  comfortable 
level.  It  is  impossible  generally  to  position  the  window  precisely  on  a  single 
attempt.  Indeed,  the  precise  opening  may  not  even  be  known  ahead  of  time. 
By  randomly  moving  the  window  back  and  forth  about  the  desired  opening,  one 
can  quickly  open  the  window  properly. 

Another  example  is  given  by  the  adjustment  of  interior  wall  sections  during  the 
construction  of  a  house.  Once  a  wall  segment  has  been  erected  vertically,  it  is 
nearly  impossible  to  execute  any  precise  motions.  This  is  because  the  wall  is 
wedged  tightly  between  the  ceiling  and  the  floor.  Nonetheless,  precise  motions 
are  required  to  ensure  that  the  wall  segment  is  oriented  properly  in  the  vertical 
and  horizontal  directions.  The  standard  approach  is  to  tap  portions  of  the  wall 
with  a  large  hammer,  then  consult  a  scale  or  plumb  to  determine  the  orientation 
of  the  wall  segment.  The  effect  of  the  tapping  operations  is  to  produce  a  random 
walk  about  the  desired  orientation.  The  scale  or  plumb  plays  the  role  of  a  sensor 
that  serves  both  to  indicate  the  desired  direction  of  motion  as  well  as  to  signal 
goal  attainment. 

Within  the  domain  of  assembly  of  nearly-rigid  parts  there  are  numerous 
examples  that  share  common  characteristics  with  theses  two  examples  from  the 
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human  world.  Tapping  parts  that  are  slightly  wedged  is  a  common  operation. 
Another  common  operation  is  searching  for  a  pin  or  hole  prior  to  a  mating  task. 

The  results  of  this  thesis  suggest  that  goal  convergence  is  rapid  if  progress 
towards  the  desired  set  point  can  be  made  on  average.  Goal  convergence  in 
this  case  means  attaining  some  small  region  about  the  set  point.  If  we  take 
the  simple  feedback  loop  of  chapter  5  as  a  guide,  one  approach  for  obtaining 
average  progress  is  to  execute  motions  whose  magnitude  is  nearly  proportional 
to  the  sensed  distance  from  the  goal.  This  corresponds  to  the  intuitive  idea 
of  moving  quickly  towards  the  goal  when  one  is  far  away,  and  moving  slowly 
otherwise.  In  the  window  example  one  modulates  the  time  interval  during  which 
the  window  is  being  either  opened  or  closed,  while  in  the  wall-tapping  example 
one  modulates  the  impulse  of  the  tapping  operations.  Once  the  window  is  near 
the  desired  opening  or  the  wall  is  nearly  vertical,  then  it  may  become  difficult 
to  control  the  velocity  of  the  system  finely  enough  to  ensure  average  progress. 
As  in  the  feedback  example  of  chapter  5.  once  the  system  is  close  to  the  goal, 
it  effectively  relies  entirely  on  randomization  to  attain  the  goal. 


6.2.2  Mobile  Robots 

An  important  characteristic  of  mobile  robots  is  their  existence  in  an  uncertain  world. 
Not  only  is  the  robot's  initial  model  of  the  world  incomplete  or  iraccurate,  but  the 
world  itself  is  changing  as  people  and  objects  move  about.  Uncertainty  is  thus  a 
fundamental  characteristic  of  the  mobile  robot  domain. 

There  is  considerable  room  for  work  in  applying  randomizing  techniques  to  mobile 
robots.  Promising  areas  include  navigation,  map  building,  and  feature  recognition. 

Randomization  for  navigation  can  help  reduce  the  knowledge  requirements  of  a 
robot.  Robots  that  use  local  algorithms  in  making  decisions  about  global  navigation 
may  become  trapped  in  some  deterministic  state  or  cycle  of  states.  Randomization 
can  prevent  this  trap  from  persisting  forever.  Even  locally  this  may  be  useful,  for 
instance  when  a  robot  finds  itself  in  a  tight  corner,  unable  to  determine  the  proper 
direction  to  turn  in  order  to  escape.  Another  example,  taken  from  probabilistic 
broadcast  networks,  is  given  by  the  problem  of  several  identical  robots  meeting  at  the 
intersection  cf  two  or  more  hallways.  If  right  of  way  rules  are  unclear  or  inapplicable 
it  makes  sense  to  arbitrat  *hese  right  of  way  rules  by  randomization.  Each  robot 
simply  executes  a  strategy  ,  at  randomly  and  repeatedly  tries  to  proceed  through 
the  intersection  or  gives  wav  for  another  robot  to  proceed. 

Randomization  may  be  of  use  in  map  building,  by  weakening  the  requirement 
for  accurate  maps.  This  is  a  difficult  area  of  research,  with  potentially  promising 
results.  A  possible  approach  is  to  view  a  map  as  one  would  a  noisy  sensor  reading. 
Some  portions  of  the  map  provide  clearly  useful  information,  while  others  do  not. 
Randomization  is  used  to  compensate  for  the  incomplete  or  inaccurate  portions  of  the 
map.  This  is  an  application  cf  randomization  as  a  means  of  blurring  environmental 
details.  As  a  trite  example,  suppose  that  a  robot  is  unsure  which  offices  along  a 
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hallway  house  graduate  students  and  which  house  professors.  Indeed,  the  state  of  the 
offices  might  actually  be  in  flux  over  time.  A  map  might  nonetheless  contain  enough 
information  to  depict  the  topology  of  the  office  building  as  well  as  the  ratio  of  graduate 
students  to  professors.  The  robot  could  then  use  this  information  to  randomly  select 
an  office  in  such  a  way  as  to  maximize  the  probability  of  encountering  a  professor. 
A  similar  problem  is  given  by  the  task  of  finding  a  free  Xerox  machine  in  a  building 
in  which  there  are  varying  numbers  of  machines  on  each  floor,  not  all  of  which  are 
necessarily  free  or  working.  This  is  a  classic  problem  out  of  decision  analysis. 

The  examples  listed  so  far  are  fairly  simple  and  at  a  high  level.  However  they 
have  their  counterparts  within  the  internal  implementation  of  the  robot.  Indeed,  one 
problem  with  robot  systems  is  the  fusion  of  multiple  sensory  information.  This  is 
often  a  complicated  process,  particularly  if  one  of  the  sensors  is  at  the  limit  of  its 
range  of  applicability.  For  instance,  a  sonar  sensor  may  indicate  the  presence  of  an 
obstacle  in  front  of  the  robot,  which  an  infrared  sensor  may  not  see.  One  possibility 
is  simply  to  arbitrate  between  the  sensors  in  a  random  fashion.  In  short,  the  robot 
imagines  the  presence  or  absence  of  certain  features  in  the  environment,  based  on 
randomly  chosen  sensory  information.  There  are  issues  involved  here  in  deciding 
how  often  to  arbitrate,  and  whether  it  is  even  safe  to  arbitrate  randomly.  These  are 
precisely  the  issues  addressed  by  the  planning  methodology  presented  in  this  thesis. 
In  particular,  the  connectivity  assumption  of  section  3.2.7  addresses  the  safety  issue. 
The  backchaining  process  using  the  operator  SELECT  of  section  3.9  addresses  the 
issue  of  when  to  randomize.  However,  much  work  re  nains  in  mapping  these  general 
techniques  into  the  mobile  robot  domain. 

6.2.3  Design 

The  design  of  parts  and  sensors  stands  as  a  task  complementary  to  the  task  of 
planning  assembly  motions.  Clearly  the  design  problem  is  much  less  constrained; 
a  pnon  the  space  of  possible  designs  has  an  enormously  large  number  of  degrees 
of  freedom.  However,  much  can  be  learned  by  considering  how  particular  assembly 
strategies  succeed  or  fail.  The  class  of  randomized  strategies  provides  another  clue  to 
the  efficient  design  and  usage  of  parts  and  sensors. 

Sensor  Design:  A  Sensor  Placement  Example 

As  an  example  consider  again  a  random  walk  on  a  two-dimensional  grid.  As  we 
learned  in  chapter  3  the  natural  tendency  of  the  random  walk  is  to  move  away  from 
the  origin  whenever  it  is  positioned  on  one  of  the  axes  of  the  grid.  More  generally, 
a  continuous  random  walk  in  a  higher  dimensional  space  has  a  natural  tendency  to 
drift  away  from  a  goal  region  situated  at  the  origin.  Taking  the  two-dimensional 
random  walk  as  an  example,  suppose  that  we  installed  a  couple  of  one- bit  sensors 
on  the  axes  of  the  grid.  These  might  be  implemented  as  light  beams  parallel  to  the 
grid  axes.  Then  one  could  reduce  the  two-dimensional  random  walk  to  a  pair  of 
one-dimensional  random  walks.  Recall  that  in  a  one-dimensional  random  walk  the 
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average  motion  progress  of  the  system  is  zero,  rather  than  away  from  the  goal.  Thus 
if  there  is  any  additional  sensing,  the  system  will  naturally  move  towards  the  goal  on 
average. 

Specifically,  one  would  let  the  system  perform  a  two-dimensional  random  ««dk 
until  it  crossed  one  of  the  light  beams.  Since  the  light  beams  cover  two  lines,  the 
system  effectively  behaves  as  if  it  were  performing  a  one-dimensional  random  walk 
with  a  goal  recognizer  at  the  origin.  Upon  observing  that  a  light  beam  has  been 
crossed,  and  remembering  which  one,  the  system  can  then  perform  a  one-dimensional 
random  walk  along  the  appropriate  axis  until  the  goal  at  the  center  of  the  two- 
dimensional  grid  is  attained.  The  reliability  with  which  the  system  can  perform  the 
one-dimensional  random  walk  depends  of  course  on  the  control  uncertainty.  All  the 
sensing  in  the  world  is  of  no  use  if  the  control  uncertainty  is  bad  enough.  However, 
assuming  reliable  control  but  possibly  poor  sensing,  this  example  demonstrates  how  an 
understanding  of  the  capabilities  of  a  randomized  strategy  may  be  used  for  designing 
sensor  placements. 

Generalizations  of  this  example  involve  the  reduction  of  higher  dimensional 
random  walks  to  a  series  of  one-dimensional  random  walks  either  by  the  addition 
of  sensors  in  appropriate  locations  or  the  modification  of  strategies. 


Parts  Design 

We  have  already  alluded  to  the  design  of  part  shapes  in  the  discussion  of  part 
orienting  by  dropping.  An  understanding  of  the  dynamic  properties  and  stable  resting 
configurations  of  differently  shaped  objects  is  essential  to  the  design  of  parts  shaped 
for  assembly.  Randomization  provides  a  context  in  which  to  consider  these  dynamic 
properties.  Said  differently,  randomization  provides  a  means  of  assessing  the  natural 
motions  of  a  part.  This  information  is  useful  for  it  describes  the  possible  motions  of 
a  part  in  the  presence  of  control  error. 

Extending  the  analysis  of  natural  part  motions  in  order  to  actually  design  parts  is 
still  an  open  area.  The  study  of  randomization  as  a  means  of  facilitating  this  process 
should  be  a  fertile  area  of  future  research. 

A  design  criterion  related  to  the  notion  of  natural  behavior  is  made  evident  by  the 
implementation  of  the  peg-in-hole  task  and  by  the  example  of  section  2.4.  In  these 
cases,  randomization  helped  the  system  find  a  path  or  region  from  which  progress 
towards  the  goal  was  rapid.  This  success  was  possible  in  these  examples  because 
of  the  system's  ability  to  approach  the  goal  from  an  arbitrary  direction.  Generally, 
that  might  not  be  possible.  However,  by  considering  the  manner  in  which  a  system 
uses  information,  the  regions  in  which  it  randomizes  its  motions,  and  the  regions  of 
tast  convergence,  a  designer  can  determine  whether  or  not  a  system  will  naturally 
gravitate  towards  regions  of  fast  convergence.  This  analysis  can  then  be  used  to 
redesign  the  system  if  necessary. 


6.3.  FURTHER  FUTURE  WORK 
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6.3  Further  Future  Work 

We  have  indicated  above  numerous  areas  in  which  randomization  may  prove  fruitful. 
Let  us  now  briefly  indicate  some  very  specific  topics  in  the  thesis  that  deserve  further 
attention. 

6.3.1  Task  Solvability 

One  of  the  motivations  for  this  thesis  was  to  work  towards  an  understanding  of  the 
class  of  tasks  solvable  by  different  repertoires  of  actions.  The  appeal  of  randomized 
strategies  lies  in  their  simplicity  and  pervasive  presence.  We  have  shown  that 
randomized  strategies  can  increase  the  class  of  solvable  tasks  over  those  solvable  by 
bounded-step  guaranteed  strategies.  We  have  also  indicated  the  manner  in  which 
randomization  can  facilitate  task  solutions,  even  when  bounded-step  guaranteed 
strategies  exist.  Nonetheless,  there  is  still  missing  a  language  in  which  one  can  talk 
about  task  solvability  and  compare  different  repertoires  of  actions.  Even  more  difficult 
is  the  actual  characterization  of  tasks  and  strategies  in  terms  of  each  other.  Much 
work  remains  to  be  done  in  this  area. 

6.3.2  Simple  Feedback  Loops 

Conditions  of  Rapid  Convergence 

We  analyzed  a  randomized  simple  feedback  loop  for  the  two-dimensional  task  of 
attaining  a  circular  region  in  the  plane.  The  strategy  was  formulated  in  general  terms. 
However,  the  results  that  we  obtained  indicating  fast  convergence  were  numerical 
results  that  assumed  particular  uncertainty  values.  While  the  qualitative  behavior 
of  the  system  is  similar  for  varying  uncertainty  values,  it  is  desirable  to  obtain  a 
set  of  explicit  conditions  formulated  in  terms  of  arbitrary  uncertainty  variables  that 
characterize  the  regions  of  fast  convergence.  Part  of  the  difficulty  in  determining  these 
conditions  is  that  one  of  the  integrals  defining  the  expected  progress  of  the  feedback 
loop  does  not  possess  an  analytic  closed-form  solution.  It  may  be  useful  in  elucidating 
these  conditions  to  consider  lower  or  upper  bounds  for  this  integral. 

Biases 

The  analysis  of  the  simple  feedback  loop  assumed  unbiased  Gaussian  errors.  This 
simplified  the  problem  to  a  one-dimensional  problem  formulated  in  terms  of  the 
distance  of  the  system  from  the  origin.  We  discussed  the  qualitative  behavior  of  the 
system  once  biases  are  introduced.  Again,  it  would  be  useful  to  determine  explicit 
conditions  charactering  the  regions  of  fast  convergence.  The  difficulty  here  is  two¬ 
fold.  First,  introducing  biases  requires  that  one  solve  a  two-dimensional  diffusion 
equation.  And  second,  recall  that  the  coefficients  of  this  partial  differential  equation 
are  determined  pointwise  by  a  double  integral.  Without  velocity  biases  the  outer 
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integral  possesses  no  analytic  description.  In  the  presence  of  some  velocity  biases, 
however,  even  the  inner  integral  is  an  elliptic  integral. 


More  Complicated  Tasks 

More  work  needs  to  be  done  on  solving  tasks  using  simple  feedback  loops.  As  a  first 
step,  one  should  consider  the  task  of  attaining  a  spherical  region  in  n-dimensional 
space,  with  n  greater  than  two.  It  would  be  interesting  to  see  whether  the  degradation 
of  convergence  times  is  simply  a  function  of  the  increased  drift  of  randomized 
strategies  in  higher  dimensional  spaces,  or  whether  sensing  degrades  as  well.  Another 
direction  to  explore  is  the  solution  of  tasks  in  which  line-of-sight  distance  is  not  a 
good  progress  measure.  One  question  is  whether  is  is  possible  to  use  distance  to  the 
goal  as  a  progress  measure.  If  the  path  to  the  goal  bends  a  lot,  it  may  be  impossible  to 
guarantee  progress.  Finally,  a  third  direction  is  the  exploration  of  more  complicated 
sensors  and  sensor  models  than  those  assumed  in  this  thesis.  Symmetric  error  balls 
are  not  always  the  best  approximation  to  the  error  in  a  sensor.  Said  differently,  using 
an  error  ball  may  be  overly  conservative. 


Diffusion  Approximation 

More  work  is  required  in  the  modelling  of  simple  feedback  loops.  Of  particular  interest 
is  the  extent  to  which  diffusion  approximations  to  randomized  strategies  are  possible. 
An  important  criterion  is  the  reliability  of  predictions  based  on  these  approximations. 


6.3.3  Learning 

We  showed  through  the  peg-in-hole  implementation  and  its  various  abstractions  that 
a  system  can  compensate  for  sensing  biases  by  randomization.  The  system  employed 
a  simple  randomized  feedback  loop.  If  one  permits  the  system  to  retain  some  history 
then  it  can  actually  learn  from  its  observations,  and  obtain  an  estimate  of  the  bias. 
This  estimate  may  then  be  used  to  improve  performance.  A  Kalman  filter  is  one 
approach  for  retaining  history  and  obtaining  an  estimate  of  the  bias.  However,  one 
can  imagine  weaker  approaches  that  do  not  put  as  much  faith  in  their  estimates. 
A  weaker  approach  might  try  to  follow  the  philosophy  of  preparing  for  worst-case 
scenarios.  This  is  a  philosophy  that  underlies  the  guaranteed-planning  approaches 
and  that  also  underlies  the  decision  by  which  a  simple  feedback  loop  makes  progress. 
A  possible  learning  approach  might  consist  of  simply  recognizing  that  convergence 
tends  to  be  fast  from  certain  regions  in  state  space.  In  other  words,  no  explicit 
estimate  is  made  of  the  sensing  bias.  Rather,  it  is  estimated  indirectly,  by  delineating 
certain  regions  that  might  serve  as  subgoals,  since  convergence  from  them  is  probably 
quick.  In  this  case  we  are  really  talking  about  history  across  multiple  iterations  of 
a  strategy  as  opposed  to  history  within  a  strategy,  though  both  are  possible.  Much 
work  remains  to  be  done  in  learning  based  on  randomization. 
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6.3.4  Solution  Sensitivity 

Randomization  may  be  thought  of  as  a  perturbation  in  the  space  of  task  solutions. 
By  randomizing,  a  system  hopes  to  find  a  solution  that  matches  the  unknown  initial 
conditions  of  the  world.  An  interesting  inverse  problem  is  to  determine  the  manner  in 
which  a  task  solution  must  change  in  order  to  remain  applicable  as  one  perturbs  the 
initial  conditions  of  the  system.  It  seems  that  there  are  critical  values  of  uncertain 
parameters  at  which  task  solutions  change  drastically.  Randomization  offers  a  means 
of  retaining  an  inapplicable  solution  by  perturbing  around  this  solution.  However,  the 
perturbations  may  have  to  be  great.  Whether  the  nature  of  the  perturbation  required 
to  solve  the  task  can  be  inferred  from  a  study  of  the  sensitivity  of  task  solutions  to 
task  parameters  is  an  interesting  and  open  question.  Answering  this  question  is  likely 
also  to  further  advance  the  characterization  of  task  solvability  and  strategy  scope. 
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